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13  ABSTRACT 

The  numerical  optimization  of  distributed  parameter  systems  is  considered.  In 
particular  the  adaptation  of  the  Davidon  method,  the  conjugate  gradient  method,  and 
the  "best  step"  steepest  descent  method  to  distributed  parameters  is  presented.  The 
class  of  problems  with  quadratic  cost  functionals  and  linear  dynamics  is  investigated. 
Penalty  functions  are  used  to  render  constrained  problems  amenable  to  these  gradient 
techniques. 

Also  considered  is  an  analysis  of  the  effects  of  discretization  of  continuous 
distributed  parameter  optimal  control  problems.  Estimates  of  discretization  error 
bounds  are  established  and  a  measure  of  the  euboptimality  of  the  numerical  solution  is 
presented. 

Numerical  results  for  both  the  constrained  and  the  unconstrained  optimal  control 
of  the  one-dimensional  wave  equation  are  given.  Both  the  distributed  and  the 
boundary  control  of  the  wave  equation  are  treated.  The  standard  numerical  comparisons 
between  the  Davidon  method,  the  conjugate  gradient  method,  and  the  steepest  descent 
method  are  reported.  It  is  evident  from  these  comparisons  that  both  the  Davidon 
method  and  the  conjugate  gradient  method  offer  a  substantial  improvement  over  the 
steepest  descent  method. 

Some  of  the  numerical  considerations,  such  as,  selection  of  appropriate  finite 
difference  methods,  multiple  quadrature  formulas,  interpolating  formulas,  storage  re¬ 
quirements,  computer  run-times,  etc.  are  also  considered, 
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THE  NUMERICAL  OPTIMIZATION  OF  DISTRIBUTED  PARAMETER 
SYSTEMS  BY  GRADIENT  METHODS 
ABSTRACT 

The  numerical  optimization  of  distributed  parameter  systems  is 
considered.  In  particular  the  adaptation  of  the  bavidon  method,  the 
conjugate  gradient  method,  and  the  "best  step"  steepest  descent  method 
to  distributed  parameters  is  presented.  The  class  of  problems  with 
quadratic  cost  functionals  and  linear  dynamics  is  investigated.  Penalty 
functions  are  used  to  render  constrained  problems  amenable  to  these 
gradient  techniques. 

Also  considered  is  an  analysis  of  the  effects  of  discretization  of 
continuous  distributed  parameter  optimal  control  problems.  Estimates 
of  discretization  error  bounds  are  established  and  a  measure  of  the 
suboptimality  of  the  numerical  solution  is  presented. 

Numerical  results  for  both  the  constrained  and  the  unconstrained 
optimal  control  of  the  one-dimensional  wave  equation  are  given.  Both 
the  distributed  and  the  boundary  control  of  the  wave  equation  are  treated 
The  standard  numerical  comparisons  between  the  Davidon  method,  the 
conjugate  gradient  method,  and  the  steepest  descent  method  are  reported. 
It  is  evident  from  these  comparisons  that  both  the  Davidon  method  and 
the  conjugate  gradient  method  offer  a  substantial  improvement  over  the 
steepest  descent  method. 

Some  of  the  numerical  considerations,  such  as,  selection  of  ap¬ 
propriate  finite  difference  methods,  multiple  quadrature  formulas, 
interpolating  formulas,  storage  requirements,  computer  run-times,  etc. 
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CHAPTER  I.  INTRODUCTION 

The  optimal  control  of  distributed  parameter  systems  is 
concerned  with  the  minimization  (maximization)  of  functionals 
constrained  by  either  nonhomogeneous  partial  differential 
equations  or  by  multiple  integral  equations.  The  study  of 
the  optimal  control  of  distributed  parameter  systems  is 
generally  considered  to  have  been  initiated  in  1960  by 
Butkovskii  and  Lerner  (1).  The  term  "distributed  parameter 
system"  was  coined  by  Butkovskii  and  was  intended  to  refer  to 
dynamical  systems  which  are  modeled  by  either  partial  dif¬ 
ferential  equations  or  by  multiple  integral  equations.  In 
fact,  it  is  these  distributed  constraints  which  distinguishes 
this  field  from  the  classical  multi-variable  calculus  of 
variations,  which  was  considered  by  Lagrange  as  early  as  1806. 

The  optimal  control  of  distributed  parameter  systems 
has  received  considerable  attention  in  recent  years.  Since 
Butkovskii  and  Lerner 's  original  paper,  well  over  two  hundred 
publications  have  appeared  in  the  literature  concerned  with 
this  subject.  At  the  present  time  two  full  length  books 
(2,  3)  have  been  published  on  this  topic  and  more  are  in 
preparation.  In  addition  many  of  the  recent  texts  on  optimi¬ 
zation  include  chapters  introducing  this  subject  (4,  5).  Also, 
a  number  of  doctoral  dissertations  have  reported  results  on 
various  aspects  of  this  field  (6,  7,  8  and  9). 

The  rapid  growth  of  literature  concerning  this  topic  has 
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motivated  a  number  of  recent  survey  papers  on  this  subject, 
which  include  extensive  bibliographies.  The  first  survey  of 
the  subject  was  generated  by  Wang  (10)  in  1964  and  still 
provides  a  good  introduction  to  the  subject.  Subsequently, 
Wang  published  an  extensive  bibliography  covering  both  the 
stability  and  the  optimal  control  of  distributed  parameter 
systems  (11).  In  1968  Butkovskii,  Egorov  and  Lurie  (12) 
published  an  excellent  survey  of  the  Soviet  efforts  in  this 
field.  In  1969,  Robinson  compiled  what  is  probably  the  most 
complete  bibliography  of  this  subject  to  date  (13) .  Robinson 
includes  in  his  bibliography  a  brief  discussion  of  many  of 
the  various  facets  of  this  topic.  Due  to  the  existence  of 
these  recent  survey  papers,  only  a  brief  introduction  to 
distributed  parameter  systems  will  be  given;  and  a  complete 
bibliography  will  be  omitted. 

At  the  present  time  there  is  no  universally  accepted 
method  for  classifying  the  works  published  on  this  subject. 

A  number  of  possibilities  are  discussed  in  (13) .  In  the  sub¬ 
sequent  discussion,  the  publications  will  be  divided  into  two 
groups:  (1)  those  papers  which  are  primarily  concerned  with 

the  mathematical  structure  of  the  problem  formulation;  and 
(2)  those  results  which  are  primarily  concerned  with  the 
problem  solution. 
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Results  Concerning  the  Problem 
Fo rmulation 

The [majority  of  the  papers  on  the  optimal  control  of  dis¬ 
tributed  parameter  systems  deal  primarily  with  the  extensions 
of  the  theoretical  results  obtained  for  lumped  parameter 
systems  to  distributed  parameter  systems.  In  fact  about  one- 
half  of  all  the  reports,  which  have  appeared  in  the  literature, 
are  concerned  with  the  problem  formulation  giving  particular 
attention  to  the  derivation  of  the  necessary  conditions  for 
optimality.  Three  basic  approaches  have  been  utilized  in 
the  derivation  of  these  necessary  conditions:  (1)  variational 
calculus;  (2)  dynamic  programming;  and  (3)  functional  analysis. 
In  addition,  there  is  the  method  of  moments  which  can  be  used 
if  che  functional  is  constrained  by  a  system  of  linear  integral 
equations  (14).  In  his  early  works,  Butkovskii  (15,  16)  con¬ 
siders  systems  described  by  integral  equations  and  employs 
variational  methods  to  derive  the  necessary  conditions  for 
optimality.  These  necessary  conditions  are  given  in  the  form 
of  integral  equations.  A  number  of  Butkovskii' s  subsequent 
works  are  concerned  with  the  development  of  methods  for  solving 
these  integral  equations.  Egorov  (17)  and  Lurie  (18)  follow 
the  work  of  Butkovskii;  however,  they  consider  systems  described 
by  partial  differential  equations.  Both  of  these  approaches 
have  their  advantages  and  their  disadvantages.  Since  the 
integral  representation  of  the  dynamical  system  yields  bounded 
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operators,  this  approach  is  useful  in  theoretical  develop¬ 
ments.  However,  the  differential  representation  of  dynamical 
systems,  which  unfortunately  introduces  unbounded  operators, 
is  useful  because  many  physical  problems  are  easily  formulated 
in  terms  of  partial  differential  equations.  In  principle  at 
least,  Green's  functions  can  be  employed  to  convert  linear 
partial  differential  equations  into  linear  integral  equations. 
However,  in  practice  this  is  not  always  possible,  certainly 
not  in  general  for  the  non-linear  case. 

Wang  was  one  of  the  first  to  use  dynamic  programming  to 
derive  the  necessary  conditions  for  distributed  parameter 
systems  with  distributed  control.  Brogan  (6)  extends  Wang's 
results  to  include  boundary  control.  The  functional  analysis 
methods  are  generally  applied  to  abstract  optimal  control 
problems  in  either  a  Banach  space  or  in  a  Hilbert  space.  With 
this  degree  of  generality,  the  results  obtained  in  these 
papers  certainly  can  be  applied  to  distributed  parameter 
systems  and  to  lumped  parameter  control  problems  as  well. 
Papers  by  Balakrishnan  (19,  20)  are  of  particular  interest  to 
the  present  investigation;  since  in  these  papers,  Balakrishnan 
considers  the  extension  of  the  classics  \  steepest  descent 
method  to  the  general  Hilbert  space  setting.  Russell  (21) 
applies  functional  analysis  methods  to  problems  in  which  the 
controls  are  finite-dimensional.  Axelband  (22)  utilizes  the 
Frechet  derivative  to  obtain  the  necessary  conditions  for  a 
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quadratic  functional.  He  then  proceeds  to  develop  methods 
for  solving  the  resulting  linear  operator  equation.  A 
number  of  authors,  for  example  (23,  24  and  25)  utilize 
certain  properties  of  special  classes  of  problems  to  develop 
methods  for  obtaining  the  optimal  control. 

Results  Concerning  the  Problem 
Solution 

In  the  optimization  of  distributed  parameter  systems, 
one's  first  impulse  is  to  transform  the  problem  into  some 
other  form  which  can  be  solved  by  existing  techniques.  This 
approach  leads  ultimately  to  some  type  of  approximation.  At 
the  present,  the  following  approximations  have  been  tried: 

(1)  eigenvector  expansion;  (2)  spacial  discretization;  and 
(3)  space-time  discretization.  Of  course,  the  eigenvector 
(eigenfunction  is  the  term  usually  used  in  this  case)  ex¬ 
pansion  techniques  are  classical  methods  of  approximating 
partial  differential  equations.  Unfortunately,  this  method 
only  works  for  linear  or  linearized  problems  with  rather 
restrictive  boundary  conditions,  when  this  method  does 
apply,  the  distributed  parameter  problem  is  reduced  to  a 
lumped  parameter  problem.  An  approximation  is  introduced  when 
the  eigenvector  expansion  is  truncated  to  a  finite  number  of 
terms  in  order  to  facilitate  a  practical  solution.  Lukes  and 
Russell  (26)  prove  that  the  solution  obtained  from  the 
truncated  eigenvector  expansion  converges  to  the  exact  solu- 
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tion  of  the  distributed  parameter  problem  as  the  number  of 
terms  in  the  expansion  increases.  Space  discretization  also 
reduces  the  problem  to  an  approximate  lumped  parameter 
problem.  However,  a  very  large  number  of  ordinary  differential 
equations  result  from  this  method;  and  conventional  lumped 
parameter  methods  have  not  proven  to  be  very  effective  in  this 
case.  Axelband  (22)  uses  space-time  discretization  to  reduce 
the  problem  to  a  parameter  optimization  problem.  However, 
once  again  the  number  of  independent  variables  becomes  ex¬ 
tremely  large;  and  difficulties  are  encountered  in  obtaining 
the  solution  with  standard  techniques.  Axelband  also  proves 
that,  in  the  limit  this  method  converges  to  the  true  solution. 
However,  in  doing  so,  he  neglects  to  consider  the  numerical 
approximations  and  their  effects  on  the  convergence  of  the 
method.  Recently,  a  number  of  authors  (27,  28,  29  and  30) 
have  alluded  to  the  fact  that  some  of  the  direct  computational 
methods  developed  for  the  solution  of  lumped  parameter  opti¬ 
mization  problems,  especially  gradient  methods,  might  also  be 
beneficially  extended  to  distributed  parameter  systems.  At 
the  present  time  computational  experience  with  the  steepest 
descent  method,  as  related  to  the  optimization  of  distributed 
parameter  systems,  has  been  reported  in  (28,  29  and  30). 

These  methods  offer  the  advantages  of  being  very  simple  and 
of  applying  to  a  broad  class  of  problems. 
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Engineering  Applications 


In  recent  years  considerations  of  the  control  of  complex 
processes, such  as  nuclear  reactors  and  chemical  production 
systems,  have  motivated  interest  in  the  optimal  control  of 
distributed  parameter  systems.  However,  these  are  not  the 
only  possible  applications  for  this  theory.  For  example,  it 
is  clear  that  an  optimal  control  theory  for  distributed 
parameter  systems  can  be  applied  to  the  process  industries 
(chemical,  petroleum,  steel,  cement,  glass,  etc.),  the  power 
industry,  and  the  aerospace  industry.  The  following  list  is 
not  complete;  nevertheless,  it  does  indicate  the  variety  of 
problem  areas  to  which  the  optimal  control  theory  of 
distributed  parameter  systems  could  be  applied: 

1.  Control  of  heat  and  mass  transfer  processes  (e.g., 
heating,  cooling,  melting,  drying,  etc.). 

2.  Control  of  fluid  dynamic  processes  (e.g.,  pumping 
of  petroleum,  hydroelectric  power  generation, 
liquid  rocket  engine  design,  acoustic  phenomena, 
etc. ) . 

3.  Control  of  chemical  and  kinetic  reactions  (e.g., 
petroleum  refining,  production  of  steel  and  glass, 
combustion  processes,  chemical  industries,  etc.). 

4.  Control  of  elastic  and  viscoelastic  vibrations  (e.g., 
heavy  equipment  industry,  aerospace  industry, 
geographic  applications,  location  of  petroleum 
deposits ,  etc. ) . 

5.  Control  of  nuclear  and  atomic  processes  (e.g., 
nuclear  power  industry,  nuclear  space  propulsion 
systems,  nuclear  energy  propagation,  etc.). 

6.  Control  of  radioactive  processes  (e.g.  ,  radiation 
shielding,  optical  and  electro-magnetic  communica¬ 
tions  ,  etc . ) . 
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7.  Control  of  hydrodynamical  and  magnetohydrodynamic 
processes . 

8.  Control  of  spacecraft  attitude  (e.g.,  heat  dissi¬ 
pation,  structural  effects,  etc.). 

9.  Control  of  melting,  freezing,  and  crystal  growth. 

10.  Control  of  environmental  processes  (e.g.,  air 

pollution,  water  pollution,  flood  control,  traffic 
control,  forest  fire  control,  etc.). 

After  examining  the  above  list,  it  becomes  immediately  apparent 
that  there  is  no  lack  of  motivation  (from  the  point  of  view 
of  applications)  for  the  theory  of  optimal  control  of  dis¬ 
tributed  parameter  systems. 

Dissertation  Objectives 

As  mentioned  before,  many  of  the  existing  results  to 
date  are  concerned  with  the  mathematical  structure  of  the 
problem  and  the  derivation  of  the  necessary  conditions  for 
optimality.  Unfortunately,  very  little  has  been  said  con¬ 
cerning  how  to  solve  these  necessary  conditions  to  obtain  the 
optimal  control.  From  the  engineering  point  of  view,  the 
problem  solution  is  at  least  as  important  as  the  problem 
formulation.  Therefore,  it  seems  desirable  that  a  large 
amount  of  future  research  efforts  should  be  devoted  to  the 
development  of  methods  for  solving  the  problems  already 
formulated. 

One  of  the  original  objectives  of  the  present  research 


was  to  demonstrate  numerically  that  the  second  generation 
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gradient  methods,  such  as  the  conjugate  gradient  method  and 
the  Davidon  method,  could  be  efficiently  adapted  to  solve 
practical  distributed  parameter  problems.  These  methods  were 
selected  because  of  their  simplicity,  their  generality,  and 
their  success  in  solving  lumped  parameter  optimal  control 
problems.  However,  preliminary  storage  requirement  calcu¬ 
lations  indicate  that  the  solution  of  realistic  distributed 
parameter  optimization  problems  are  beyond  the  present 
storage  capabilities  of  the  Model  360-65  system.  In  addition, 
early  numerical  results  indicate  that  the  approximations  in¬ 
volved  in  discretizing  the  continuous  problem  are  causing 
substantial  errors  in  the  approximate  solution.  It  was 
realized  that  in  order  to  effectively  solve  distributed  optimal 
control  problems  by  gradient  methods,  it  is  essential  to 
determine  the  effects  of  these  approximations  on  the  numerical 
solution.  The  new  objectives  formulated  arer  (1)  to  develop 
a  general  optimization  theory  for  a  particular  class  of 
distributed  parameter  problems?  (2)  to  isolate  those  approxi¬ 
mations  which  cause  the  largest  errors  in  the  numerical  solu¬ 
tion  for  this  class  of  problems;  (3)  to  determine  the  effects 
of  these  errors  on  the  class  of  gradient  methods;  (4)  to 
develop  estimates  for  the  errors  between  the  exact  and  the 
approximate  solution;  (5)  to  evaluate  the  effectiveness  of  the 
conjugate  gradient  method  and  the  Davidon  method  in  comparison 
with  the  standard  steepest  descent  method  on  this  class  of 
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problems?  and  (6)  to  generate  numerical  results  which  sub¬ 
stantiate  the  theory  developed  in  objectives  (1)  through  (5). 

Class  of  Problems  Considered 

The  non-linear  distributed  parameter  optimal  control 
problem  is  easily  formulated;  and  if  the  existence  of  a  rela¬ 
tive  minimum  is  assumed,  the  derivation  of  the  necessary  con¬ 
ditions  for  optimality  is  straight  forward.  However,  the 
solution  of  a  non-linear  distributed  parameter  optimal  control 
problem  is  usually  very  difficult.  Existence  and  uniqueness 
considerations  for  both  the  minimizing  element  and  the 
distributed  dynamical  system  dictate  that  extreme  care  be 
exercised  in  the  selection  of  the  class  of  problems  to  be 
considered. 

Fortunately,  gradient  methods  do  not  require  the  a 
priori  assumption  of  the  existence  of  a  relative  minimum. 
However,  they  do  require  the  existence  and  uniqueness  of  the 
solutions  of  the  dynamical  system.  Thus,  the  selection  of  the 
distributed  dynamical  system  to  be  optimized  is  an  important 
consideration  in  distributed  parameter  problems. 

The  second  generation  gradient  methods  are  basically  un¬ 
constrained,  quadratic  functional,  optimization  methods.  Thus, 
it  seems  natural  to  investigate  their  performance  on  quadratic 
distributed  parameter  problems,  especially,  since  quadratic 
problems  play  such  a  significant  role  in  the  present  state  of 
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the  art  of  distributed  parameter  systems.  The  penalty  function 
approach  can  be  used  to  alter  the  constrained  distributed 
parameter  optimal  control  problem  into  an  unconstrained,  quad¬ 
ratic  functional,  optimization  problem?  if:  (1)  the  penalty 

functional  is  quadratic;  (2)  the  original  cost  index  is 
quadratic;  and  (3)  the  distributed  parameter  dynamical  system 
is  linear.  In  the  following,  only  problems  with  the  above 
properties  will  be  considered;  and  will  be  referred  to  as 
quadratic  programming  problems. 

Dissertation  Outline 

The  distributed  parameter  optimal  control  problem  is 
formulated  in  Chapter  II.  The  concept  of  a  functional  deri¬ 
vative  is  utilized  to  derive  the  expression  for  the  gradient 
of  the  cost  index.  Brief  remarks  are  made  concerning  methods 
which  use  the  gradient  to  obtain  the  optimal  control. 

Gradient  methods  are  introduced  in  Chapter  III. 
Specifically,  an  introduction  of  the  three  most  popular 
gradient  methods  is  presented.  The  concepts  of  the  inner  and 
outer  loop  iterations  are  discussed,  and  popular  inner  loop 
iterators  are  introduced. 

The  development  of  an  approximation  theorv  for  the 
numerical  solution  of  distributed  parameter  systems  by 
gradient  methods  is  presented  in  Chapter  IV.  The  definitions 
of  the  Optimal  Control  Error  and  the  Cost  Functional  Error 
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are  introduced.  It  is  shown  that  the  approximations  involved 
in  the  discretization  of  the  continuous  problem  cause  gradient 
errors.  The  effects  of  gradient  error  on  gradient  methods  is 
analyzed.  Error  estimates  for  the  approximate  numerical 
solution  are  developed.  A  geometrical  interpretation  of  these 
error  estimates  is  presented. 

Chapter  V  presents  numerical  results  for  both  the  con¬ 
strained  and  the  unconstrained  optimal  control  of  the  one¬ 
dimensional  wave  equation.  Both  ■  tributed  control  and 
boundary  control  are  considered.  Penalty  functions  are  used 
to  render  the  constrained  problem  amenable  to  the  gradient 
methods.  Standard  numerical  comparisons  between  the  conjugate 
gradient  method,  the  Davidon  method ,  and  the  steepest  descent 
method  are  given.  Some  of  the  numerical  considerations,  such 
as  selection  of  appropriate  finite-difference  methods, 
multiple  quadrature  formulas,  storage  requirements,  computer 
run  times,  etc.,  are  discussed 

Concluding  remarks  and  recommendations  for  additional  re¬ 
search  are  given  in  Chapter  VI. 

Appendix  A  introduces  mathematical  concepts  which  are 
pertinent  to  this  dissertation.  The  coverage  of  these  topics 
is  extremely  brief;  consequently,  it  is  not  intended  to  be  an 
introduction  to  any  of  the  areas  discussed,  but  rather  as  a 
point  of  reference  for  the  development  presented  in  the  main 
body  of  this  work. 
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In  Appendix  B  the  derivation  of  the  necessary  conditions 
for  a  general  non-linear  distributed  parameter  optimal  control 
problem  is  presented.  Ordinary  differential  equations  on  the 
spatial  boundary  are  considered.  The  standard  calculus  of 
variations  is  employed  in  the  derivation  of  the  necessary  con¬ 
ditions  for  optimality. 
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CHAPTER  II.  THE  OPTIMAL  CONTROL  OF  DISTRIBUTED 
PARAMETER  SYSTEMS 
The  Optimal  Control  Problem 

The  optimal  control  problem  may  be  stated  as  follows: 
minimize 

J[u;x],  (2.1) 

subject  to 

>  0/  xeX  and  ueUj  (2.2) 

where  X  is  called  the  state  space,  U  the  set  of  admissible 
controls,  and  J[u;x]  is  a  real  valued  functional  defined  on 
the  product  space  U  x  X.  The  non-linear  operator  (J;  is  de¬ 
fined  on  U  x  X,  and  0  is  the  null  vector  of  this  product  space. 
The  functional  J  [  * ; - ]  is  generally  referred  to  as  the  cost 
index,  and  the  conditions  of  Equation  2.2  are  called  the 
constraints.  As  a  consequence  of  the  constraints,  the  state 
trajectory  x(t)  is  dependent  upon  the  control  u.  Thus  any 
particular  optimal  control  problem  depends  on  the  nature  of 
the  functions J  and  and  on  the  sets  X  and  U.  Consider 
the  following  special  cases:  1.  Parameter  Optimization: 
let  X  and  U  be  real  Euclidean  vector  spaces,  let  be  a  vector 
valued  function,  and  let  J  be  a  scalar  valued  function; 

2.  Lumped  Parameter  Optimization:  let  X  and  U  be  properly 
selected  function  spaces,  let  ^  be  decomposed  into  two 
operators  T  and  S,  where  T  represents  an  algebraic  equality 
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and/or  inequality  constraint,  and  where  S  denotes  a  differen¬ 
tial  and/or  integral  operator  with  respect  to  one  variable, 
and  let  J  be  a  real  valued  functional;  3.  Distributed 
Parameter  Optimization;  let  X  and  U  be  properly  selected 
function  spaces,  let  if/  be  composed  of  algebraic,  differential 
and/or  integral  operators  with  respect  to  more  than  one 
variable,  and  let  J  be  a  real  valued  functional.  The  solu¬ 
tion  of  problems  formulated  in  case  3  (above)  is  the  topic 
of  this  dissertation. 

The  difficulties  encountered  in  solving  for  the  optimal 
control  of  a  distributed  parameter  system  are  generally 
related  to  the  complexity  of  the  constraints,  Equation  2.2. 

For  distributed  parameter  systems  very  few  general  results 
are  available  concerning  the  existence  and  uniqueness  of  the 
solution  to  the  constraint  equation.  Consequently,  little 
can  be  said  regarding  the  solution  of  the  general  distributed 
parameter  problem.  However,  there  exist  certain  classes  of 
problems,  of  practical  significance,  for  which  results  can  be 
obtained.  For  one  such  important  class  one  lets:  (1) 

J[u;x]  be  a  quadratic  functional  in  u  and  x;  and  (2)  ^[u;x]  be 
a  linear  equality  constraint.  It  is  this  particular  class  of 
distributed  parameter  problems  which  will  be  considered 


k 


in  what  follows. 
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The  Distributed  Parameter  Optimal 
Control  Problem 


Cost  index 

Let  J[u;x]  be  a  real  valued  quadratic  functional  defined 
on  the  real  separable  Hilbert  space  U  x  x,  generated  by  the 
self-adjoint  operators  M  and  N,  and  by  the  inner  product 
<•,*>,  and  let  J[u;x]  be  specified  by 

J(u;x]  =  cQ  +  <c1#u>  +  ■|-<u,Mu>  +  <c2,x>  +  i<x,Nx>, 

(2.3) 


or 


J[ 

ud' 

£ 

u 

o 

0 

+ 

A 

\°t 

b 

t 

ud 

V 

[clJ 

.“b. 

+ 


'IWfllV 


> 


+  <C2,X>  +  ^<x,Nx>, 


(2.4) 


where  XCL2[nxT],  UCL2[J2xT],  fiCRm,  TCR1,  CgER1,  c^eU, 
c2eX,  and  where  the  vector  c^  and  the  operator  M  are  parti¬ 


tioned  into 


d 


and  [Md | M^] ,  respectively .  At  any  time  teT 


the  distributed  state  of  the  system  is  denoted  by 
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x  (r ,  t)  = 


xl(rl'r2 . rm't) 


xn(rl'r2 . V1’ 


(2.5) 


where  ref),  and  where  each  component  ^ri'r2'  •  •  •  ,rm»t)  ex, 
i=l,2,...,n.  Let  the  control  vector  u  denote  both  the  dis¬ 
tributed  control  and  the  boundary  control,  that  is 


u  = 


ud  (r ,  t) 


ub(rb'° 


(2.6) 


The  distributed  control  vector  ud  is  represented  by 


ud(r,t)  = 


rud(rl,r2' - ‘ * ,rm't) 


Lud(rl'r2"  *  *  ,rm,t) 


(2.7) 


where  each  component  ud (r^ , r^ , . . • , t) eU ,  i=l , 2 , . . . , p£n . 
The  boundary  control  vector  u^  is  represented  by 


Vvu  * 


rub<rb’rb . 


k  ,  1  2  m  ,  . . 

.Wrb . rb'uJ 


(2.8) 


where  r^eSf),  and  each  component  u^  (r^  ,r^ , .  /  .  ,  r™ ,  t)  eU , 


i=l , 2 , . . . ,k  <n . 


Constraints 

Let  the  constraint  <Mu;x]  be  decomposed  into  the  linear 
distributed  dynamical  system 
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Sx(r,t)  =  ud(r,t)  (2.9) 

with  initial  condition 

x(r,0)  =  Xq (r) ,  (2.10) 

and  terminal  condition  (target  set) 

4'1x(r,Tf)  -  xD(r)  ,  (2.11) 

and  into  the  boundary  condition 

Tx(rfe,t)  =  u^fr^t),  (2.12) 


where  S  and  T  are  linear  differential  operators  consisting  of 
a  linear  combination  of  a  time  differential  operator  ,  and 
a  spatial  differential  operator  Dr,  given  by 

S  =  CjDt  +  c2Dr  ,  (2.13) 

T  =  ^ c3°t  +  c4Dr'1  |  3Q  '  cieRl'  (2.14) 

and  4^  is  a  nxn  self-adjoint  matrix,  and  xD(r)  is  the  desired 
state  of  the  final  time.  In  addition  it  is  assumed  that 
Equations  2.9,  2.10,  2.11,  and  2.12  satisfy  the  conditions 
of  Theorem  1.1  in  (20),  which  insures  the  well-posedness 
of  the  dynamical  system,  and  the  representation  of  the  solution 
in  terr  of  integral  operators.  The  distributed  optimal 
control  problem  for  the  system  of  Equations  2.9,  2.10,  2.11, 
and  2.12  with  respect  to  the  cost  index  J,  and  the  set  of 
admissible  controls  U  can  now  be  restated  as  follows: 
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determine  the  control  u*eU  such  that 

J[u*;x(u*)]  =  min{J [u;x (u) ] }  .  (2.15) 

ueU 

Conditions  for  Optimality 
Existence  of  an  optimum 

For  the  class  of  problems  considered  the  existence  and 
the  uniqueness  of  the  minimizing  element  u*  can  be  easily 
proven  (31).  This,  of  course,  is  certainly  not  the  case 
for  the  general  distributed  parameter  optimal  control  problem, 
since  existence  and  uniqueness  results  for  even  the  dynamical 
system  do  not  (in  general)  exist.  The  existence  and  unique¬ 
ness  of  the  solution  was  one  of  the  primary  reasons  for  the 
selection  of  this  particular  class  of  problems,  as  the 
subject  of  the  present  investigation. 

Derivation  of  the  necessary  conditions  for  optimality 

The  numerical  methods  which  are  used  in  this  dissertation 
are  directly  applicable  to  only  the  unconstrained  problem. 
Thus,  it  is  convenient  to  transform  this  constrained 
problem  into  some  equivalent  unconstrained  problem. 

Assume  that  the  distributed  dynamical  system  defined  by 
Equations  2.9  through  2.12  satisfy  the  conditions  of  Theorem 
1.1  (20);  this  insures  that  the  dynamical  system  has  a  unique 
solution  for  all  ueU.  However,  only  the  controls  in  a  subset 
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UyCU  drive  the  system  from  the  initial  state  to  the  target 
set.  Therefore,  the  specification  of  a  target  set  causes  the 
dynamical  system  to  generate  a  constraint  in  the  control 
space.  Hence,  strictly  speaking  only  the  controls  ueU^  are 
admissible,  since  if  uetU^]^  the  u  does  not  satisfy  all  of 
the  constraints. 

The  penalty  function  method  will  be  employed  to  render 
the  constrained  problem  amenable  to  gradient  methods.  The 
original  problem  is  then  replaced  by  an  equivalent  un¬ 
constrained  problem.  The  only  requirement  of  a  penalty 
function  is  that  it  be  a  positive  measure  of  the  constraint 
violation.  Thus  for  any  particular  problem,  the  penalty 
function  is  not  unique.  In  the  subsequent  development  the 
following  penalty  function  will  be  utilized: 

P[x(r,Tf)]  =  <'F,W't'>=<y1x(r,Tf)-xD(r)  ,W(4'1x(r,Tf)-xD(r)  )> 

(2.16) 

where  W  is  a  nxn  self-adjoint  matrix  of  penalty  constants. 

The  constrained  prc  blem  may  be  restated  as  an  uncon¬ 
strained  problem  as  follows: 
minimize 

.r  [u;x]  =  J[u;x]  +  P[x(r,T  )]  (2.17) 

P  F 

subject  to  Equations  2.9,  2.10,  and  2.12  (Note:  Equation 
2.11  is  omitted).  In  the  unconstrained  problem  the  dynamical 
system  is  not  a  constraint,  but  rather  a  side  condition,  which 
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By  representing  the  solution  of  the  state  system  in  terms 
of  appropriate  Green's  functions  it  is  possible  to  remove  the 
explicit  dependence  of  the  cost  index  on  the  state  trajectory. 
Thus,  let  the  solution  of  the  dynamical  system  exist,  be 
unique,  and  have  the  representation 

x(r,t)  =  <Mt)x0(r)  +  s“3(t)ud(r,t)  +  T  }t)ub(rb,t),  (2.18) 

where  <t>(t)xQ(r)  denotes  the  contribution  to  the  solution  at. 

time  t  due  to  the  initial  conditions,  S  ^t)ud(r,t)  denotes 

the  contribution  to  the  solution  at  time  t  due  to  the  di s- 

-1 

tributed  control,  and  T  (^ii^r^t)  denotes  the  contribution 
to  the  solution  at  time  t  due  to  the  boundary  control.  The 
state  of  the  system  at  the  time  is  then  denoted  by 

-1  -1 

x(r,Tf)  =  $  (Tf )  Xq  (r)  +  S  (Tf)ud(r,t)  +  T  (Tf )  ub  (rfa  ,  t )  . 

(2.19) 

The  state  trajectory  is  eliminated  from  the  penalized 
cost  index  by  substituting  Equations  2.18  and  2.19  into 
Equation  2.17,  i.e., 

-1  -1 

Jp[u?x]  =  J  [u;4>  (t)  xQ+S  (t)ud+T  (t)ub] 

-1  -1 

+  P[$(Tf)xQ  +  S  (Tf)ud  +  T  (Tf)ub).  (2.20) 

Simplification  of  Equation  2.20  yields  the  standard  quadratic 
form 
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Jp[u]  =  JQ  +  <c,u>  +  j<u,Au>, 


(2.21) 


where 


J0  =  c0  +  <(**(t)c2-$*(Tf)'l'*[W*+W])xD,x0> 


+  <(^^-  +  4'1*(Tf))x(),  (HlilL  +w¥1$(Tf)) 


X0> 


+  <xd/Wxd><v 


(2.22) 


* 

LC2. 


_d ,  v  ■>  *  ■  '1'-”^ 


C£+[S  (t)]  C2+[j(S  ft) 1  (N*+N)]*(t)x 


+  [  [S^T 


*  *  * 


f)  ]  (t'1[W  +W]4'1]^(Tf  )x 


“IS  ^Tf)]%*[W*+W]xD 


C^+[T  U)]*C2+[i[T-:(t)]*  (N*+N)  ]»(t)x 


+  ([T  (Tf)  ]*yJ[W*+W]'l'1]*(Tf)x 


-1 


*  ★  ★ 


[T  (Tf )  ]  'F1  [W  +W]xD 


(2.23) 


A  = 


'll  12 


A21  A22 


(2.24) 


with 


All*Md+[S"^t)  ]*NS""^t)+2[S"^Tf)  ]*4'lW4'lS  (2.25) 

-1  *  -1  -1  *  *  -1 
A12-[S  (t)]  NT  (t)+2[S  (Tf)3  YjWVjT  (Tf )  ,  (2.26) 

A2i»[T“^t)  ]*NS  )  +2  (T  *Tf)]*4'JwH'1s“'ta,f)  ,  (2.27) 

-1  *  -1  -1  *  *  -1 
A22=Mb+[T  (t)]  NT  <t)+2IT  <Tf>l  ’i'1W4'1T  (T f )  .  (2.28) 


The  necessary  condition  for  u*  to  be  the  element  of  U 

which  minimizes  Jp  is  that  the  gradient  (utilizing  the  Frechet 

derivative)  of  J  with  respect  to  the  control  u  vanish  at  u*. 
P 

Thus 

=  C  +  i[A+A’",]u*  =  0.  (2.29) 

u* 

If  A  is  a  self-adjoint  operator,  then  Equation  2.29  yields 

g(u*)  =  c  +  Au*  =  0  .  (2.30) 

From  Equations  2.25  through  2.28  it  follows  that  for  A  to  be 
self-adjoint,  M^,  N,  and  W  must  be  self-adjoint.  If 

A  is  self-adjoint  then  the  Hessian  of  Jp  given  by 


£ 


u’ 


=  g  (u* )  = 


l3uh 
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is  positive  definite.  Consequently  Equations  2.30  and  2.31 
are  necessary  and  sufficient  conditions  for  u*  to  exist  and 
be  the  unique  optimal  control  for  the  penalized  cost  index  J  . 

Methods  of  Solution 

This  dissertation  is  not  primarily  concerned  with 
formulating  necessary  conditions  for  optimality,  but  rather 
in  developing  practical  methods  for  solving  distributed 
parameter  optimal  control  problems.  Thus,  a  brief  introduction 
of  the  basic  optimization  methods  is  warranted.  In  the  opti¬ 
mization  literature  two  basic  classifications  for  the  methods 
of  solution  have  evolved.  These  categories  are  generally 
referred  to  as  the  direct  and  the  indirect  methods. 

Indirect  methods 

Indirect  methods  are  those  methods  which  determine  the 
optimal  control  by  indirectly  solving  (in  most  cases 
iteratively)  the  operator  equation 

g(u)  =  0  .  (2.32) 

In  general  Equation  2.32  is  used  to  eliminate  the  control  from 
the  state  and  costate  systems.  Once  the  control  is  elimi¬ 
nated,  the  state  and  the  costate  systems  form  the  classical 
two  point  boundary  value  problem  (TPBV) .  The  optimal  control 
can  be  determined  once  this  TPBV  problem  is  solved.  Most 
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indirect  methods  are  characterized  by  an  iterative  modification 
of  either  the  boundary  conditions  and/or  the  partial  dif¬ 
ferential  equations. 

Direct  methods 

Direct  methods  are  those  methods  which  determine  the 
optimal  control  by  directly  operating  on  the  cost  index  J . 

Based  on  information  concerning  J  and  possibly  the  gradient 
of  J  the  direct  methods  result  in  an  iterative  procedure 
which,  hopefully,  converges  to  the  optimal  control.  These 
methods  require  an  initial  guess  to  start  the  iteration,  and 
then  correct  this  initial  guess  in  a  certain  predetermined 
manner.  The  various  direct  methods  differ  principally  in 
the  means  used  to  determine  the  control  correction.  The 
gradient  methods  which  are  certainly  the  most  popular  of  this 
class  of  direct  methods  will  be  discussed  in  more  detail  in 
the  following  chapter. 
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CHAPTER  III.  INTRODUCTION  TO 
GRADIENT  METHODS 

Gradient  methods  are  direct  optimization  methods  which 
utilize  derivative  information  during  the  iteration.  The 
most  well-known  of  the  classical  gradient  methods  are  the 
steepest  descent  method  and  the  Newton- Raphson  method.  The 
steepest  descent  method  is  a  first  order  method  (i.e.,  it 
uses  first  derivative  information)  which  is  characterized  by 
simple  logic,  stability  with  respect  to  the  initial  guess,  and 
slow  convergence  near  the  solution.  In  contrast  to  the 
steepest  descent  method  is  the  Newton- Raphson  method,  a 
second  order  method  which  exhibits  rapid  convergence  near 
the  solution,  but  poor  stability  with  respect  to  the  initial 
guess.  In  recent  years  a  class  of  second  generation  gradient 
methods  have  been  developed  which  combine  the  simplicity  and 
stability  of  the  first  order  methods  with  the  convergence 
properties  of  the  second  order  methods.  The  most  popular  of 
this  class  of  gradient  methods  are  the  conjugate  gradient 
method  and  the  Davidon  method.  Although,  the  motivation  for 
each  of  these  two  methods  is  different,  their  performance  is 
strikingly  similar.  In  fact,  these  two  methods  (theoretical¬ 
ly)  produce  identical  iterations  on  quadratic  problems  (32) . 

At  the  present  time  only  the  standard  steepest  descent 
method  has  been  adapted  to  the  optimization  of  distributed 
parameter  systems.  In  this  dissertation  the  numerical 
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adaptation  of  the  conjugate  gradient  method  and  the  Davidon 
method  to  distributed  parameter  systems  is  presented < 

In  general/  gradient  methods  are  employed  to  design 
computer  algorithms  which  are  used  to  obtain  approximate 
solutions  to  optimization  problems.  These  algorithms  usually 
consist  of  two  iterative  processes/  which  are  interrelated. 

The  terms  "outer  loop  iterator"  and  "inner  loop  iterator" 
are  introduced  to  denote  these  two  iterative  processes.  The 
reasons  for  this  designation  will  become  apparent  when  the 
algorithm  is  introduced. 

Before  presenting  the  general  gradient  algorithm  some 
nomenclature  and  definitions  have  to  be  intrdocued.  Let  the 
control,  the  gradient,  the  direction  of  search,  and  the 
control  correction  parameter  at  the  n  iteration  be  denoted 
by  un,  gn,  sn  and  yn,  respectively?  where  UneU  for  all 
n>0 ,  <3neG  for  all  n>0,  sneS  for  all  h>  0 ,  and  y^eR  for  all 
n>0,  and  where  u,  G,  and  S  are  real  separable  Hilbert  spaces. 
In  the  cases  to  be  considered  spaces  G,  S,  and  U  are  iden¬ 
tical  . 

Definition  3,1:  The  outer  loop  iterator,  specified  by  the 
particular  gradient  method  employed,  implicitly  determines 
the  direction  of  search  sn  and  explicitly  performs  the  control 
iteration;  and  is  given  by 
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n+1 


—  O,  ( u  « Y  _  »  u  .#•»•#  u  ) # 
L  n  n  n-1  n-m 


n-m' 


(3.1) 


n-m 


where  Or  :  R  x  IT  U-HJ,  and  m  denotes  the  number  of  back 
L  1*0 


points  used  in  the  iteration. 


Remarks;  1.  The  semicolon  in  Equation  3.1  separates  the  point 
at  which  new  data  are  used  from  the  point  at 
which  old  data  are  reused. 

2.  The  iteration  formula  defined  by  Equation  3.1 
is  referred  to  as  a  one  point  iterator  with 
memory  (33)  . 


Definition  3.2:  The  inner  loop  iterator  determines  the 
control  correction  parameter  Yn*  and  is  given  by 


Yi+1  "  VYi'J[un+Yisn]'  9<u„+y;>J), 


n  ’  n  n 


where 


R1xR1xUxU^R1 . 


(3.2) 


Gradient  Method  Algorithm 

The  interrelationship  between  the  inner  loop  and  the 
outer  loop  iterators  is  best  illustrated  in  the  gradient 
method  algorithm.  This  algorithm  is  as  follows: 

Outer  loop  iteration 

1.  For  n=0,  guess  an  initial  control  function  u^ . 


2.  Calculate  the  gradient  of  the  cost  functional 


g ( u  )  =  g  bv : 

a.  integrating  the  state  system  from  tg  to  T ^ ; 

b.  integrating  the  costate  system  backwards 
from  T^  to  tg. 

3.  Calculate  the  direction  of  search  s  . 

n 

4.  Inner  loop  iteration;  calculate  the  control 
correction  parameter  y  . 

5.  Calculate  the  control  correction. 

6.  Test  the  convergence  criteria;  if  these  tests  are 
not  satisfied,  increase  n  and  repeat  computations 
beginning  with  step  2. 

The  logic  flow  chart  for  the  above  algorithm  is  presented 
in  Figure  3.1. 

The  various  gradient  methods  differ  principally  in  the 
means  used  to  determine  the  direction  of  search  (step  3) , 
and  the  control  correction  parameter  yn  (step  4).  The 
conjugate  gradient  method  and  the  Davidon  method  are  outer 
loop  iterators  with  memory,  whereas  the  steepest  descent 
method  is  an  outer  loop  method  without  memory,  i.e., 
steepest  descent  always  searches  in  the  negative  gradient 
direction.  Thus,  the  conjugate  gradient  method  and  the  Davidon 
method  are  able  to  utilize  the  results  of  previous  iterates 
to  improve  the  direction  of  search;  and  hence  converge  more 
rapidly  than  the  methods  without  memory. 


31 


Outer  loop  iterators 

The  approximation  theory  developed  in  the  next  chapter 
applies  to  gradient  methods  in  general.  However,  numerical 
results  will  be  presented  for  only  the  following  three 
gradient  methods:  the  steepest  descent  method,  the  con¬ 
jugate  gradient  method,  and  the  Davidon  method.  A  brief 
introduction  to  each  of  these  methods  is  presented  below. 

The  steepest  descent  method  The  steepest  descent 
method  is  perhaps  the  oldest  of  the  direct  search  methods. 

This  method  was  originally  developed  for  minimizing  a  function 
defined  on  a  real  Euclidean  vector  .'-pace.  An  account  of  this 
method  was  given  as  early  as  1847  by  Cauchy.  Later,  it 
was  named  the  method  of  gradients  by  Hadamard.  In  1945  the 
steepest  descent  method  was  extended  to  the  case  where  the 
function  is  defined  on  a  Hilbert  space  (34) .  More  recently 
Bryson  et  al.  (35,  36)  and  Kelley  (37)  have  used  the  steepest 
descent  method  to  solve  lumped  parameter  optimal  control 
problems.  Several  authors  (9,  28,  29  and  30)  have  applied 

the  steepest  descent  method  to  the  distributed  parameter 
optimal  control  problem. 

The  basic  philosophy  of  the  steepest  descent  method  is 
very  simple.  The  maximum  rate  of  decrease  of  J  in  a  neighbor¬ 
hood  of  an  admissible  control  u  is  in  the  direction  defined 

n 

by  -gn.  This  direction  defines  the  half-ray  un+]_=un~‘V'gn ,  Y>.0  • 
Thus  to  obtain  the  maximum  decrease  in  the  cost  index,  the 
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best  local  direction  of  search  is  in  the  negative  gradient 
direction;  hence,  the  method  is  named  steepest  descent. 
Consequently  the  outer  loop  iterator  is  given  by 


u 


n+1 


=  u  -  v  g 
n  'n^n 


(3.3) 


where  the  control  correction  parameter  Yn  is  determined  by 
the  inner  loop  iterator. 

It  is  important  to  note  that  in  the  general  case  the 

direction  of  search  s  defines  the  direction  of  maximum 

n 

decrease  in  J  only  for  vn  arbitrarily  small.  In  practice 

the  selection  of  small  control  correction  parameters  leads 

to  excessive  iterations.  In  fact  to  insure  that  {u  }-*u*, 

n 

Yn  must  be  bounded  away  from  zero.  If  YN=0  for  some  N^O , 
then  becomes  a  fixed  point  of  the  outer  loop  iterator;, 
but  gN  is  not  necessarily  the  null  vector,  and  hence  is 
not  necessarily  the  minimizing  element  u* .  The  slow 
convergence  of  the  steepest  descent  method  near  the  solution 
can  be  attributed  to  the  fact  that  as  the  iteration  converges 
the  gradient  tends  to  the  null  vector.  Hence,  the  control 
correction  Ynl|gnll  becomes  excessively  small,  unless  proper 
precautions  are  taken  in  the  selection  of  y  .  This  brief 
discussion  indicates  the  importance  of  the  inner  loop 
iterator. 

The  simplicity  and  the  stability  of  the  steepest  descent 
method  enables  it  to  be  adapted  to  many  difficult,  practical 
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problems.  These  characteristics  are  important  to  practicing 
engineers c  and  they  often  outweigh  the  slow  convergence 
properties  of  the  steepest  descent  method. 

The  conjugate  gradient  method  The  conjugate  gradient 
method  was  originally  developed  as  a  method  for  solving  a  set 
of  linear  algebraic  equations;  the  solution  of  the  set  of 
equations  being  related  to  the  minimum  (maximum)  of  a  certain 
properly  selected  cost  index  (38).  In  1954  Hayes  (39) 
extended  the  original  conjugate  gradient  method  to  linear 
operator  equations  in  a  Hilbert  space.  Since  then  (40) 
and  (41)  have  also  considered  the  adaptation  of  this  method 
to  the  solution  of  linear  operator  equations.  Fletcher  and 
Reeves  (42)  then  modified  the  conjugate  gradient  method  and 
used  it  to  develop  a  parameter  optimization  algorithm. 

Lasdon  et  al,  (43)  ,  and  Sinnott  and  Luenberger  (44)  extended 
the  conjugate  gradient  method  to  lumped  parameter  optimal 
control  problems . 

The  conjugate  gradient  method  is  a  gradient  method  with 
memory.  The  motivation  for  this  method  is  given  by  the  follow¬ 
ing  considerations.  Let  the  set  of  admissible  controls  U  be  a 
real,  separable  Hilbert  space,  i.e.,  U  contains  a  countable  dense 
subset.  The  separability  insures  the  existence  of  at  least 

one  linearly  independent  set  of  basis  vectors  (s  ),  s  eU, 

n  n 

such  that  the  finite-dimensional  subspaces  spanned  by 
(Sq ,s^ , . . . »sn_^)  form  an  expanding  sequence  of  subspaces. 
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whose  union  is  the  closure  of  the  control  space.  If  for  each 
n^_0,  the  inner  loop  iterator  minimizes  J  over  the  translation 
of  the  one-dimensional  subspace  defined  by  un+^=u  +ys  ,  then 

J^un+l^=Jfun+Ynsn^-J^Un+YSn''  for  a11  Y-° '  (3.4) 


and 


,|V!|,J|VlVn.ll-JiVl*',W  for  aU  Y-°- 


(3.5) 


Thus,  two  one-dimensional  minimizations  are  sequentially  per¬ 
formed  over  a  translation  of  the  subspaces  spanned  by  sn  and 
sn+^,  respectively.  The  following  important  question  now 
arises.  How  can  the  direction  of  search  sn+^  be  selected 
such  that  the  result  of  this  sequence  of  two  one-dimensional 
minimizations  give  the  same  solution  as  would  a  two-dimensional 
minimization  over  the  translation  of  the  two-dimensional  sub¬ 
space  spanned  by  fsn,sn+^}.  That  is,  how  should  sn+^  be  deter¬ 
mined  such  that 


J  (u„  1.0  ]  <J  (u  +<*s  +Bs_  , .  ]  for  all  a>0  and  8>0.  (3.6) 

n+z  —  n  n  n+i  —  — 


The  conjugate  gradient  method  generates  such  an  outer  loop 
iterator.  This  means  that  the  solution  obtained  by  performing 
a  sequence  of  one-dimensional  minimizations  over  a  properly 
selected  set  of  translated  subspaces  yields  the  minimum  of 
the  functional  over  the  translated  subspace  spanned  by  this 
set.  This  method  is  referred  to  as  the  "method  of  expanding 
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subspaces"  . 

At  the  present  time  there  exist  two  versions  of  the 
conjugate  gradient  method;  the  original  version  is  developed 
in  (38),  and  the  modified  version  is  developed  in  (45). 
Willoughby  (46) ,  presents  an  excellent  discussion  and  com¬ 
parison  of  these  two  versions;  and  demonstrates  numerically 
that  on  quadratic  functionals  these  two  methods  do  not 
produce  identical  iterations  as  the  theory  predicts.  Never¬ 
theless,  the  modified  version  requires  substantially  less  com¬ 
putation;  hence,  it  will  be  utilized  in  what  follows. 

In  the  modified  conjugate  gradient  method  the  direction 
of  search  is  determined  as  follows: 


s„  =  -g„  +  8s  ,  , 
n  n  n  n-i 


where 


Bn  = 


<gn'  gn> 
<gn-l '  gn-l> 


(3.7) 


(3.8) 


if  n=0,  then  8q  =  0. 

The  outer  loop  iterator  for  the  conjugate  gradient  method  is 
given  by 


u 


n+1 


=  u 


+  v  s 
1  n  n 


(3.9) 


The  second  term  on  the  right  hand  side  of  Equation  3.7  is 
the  memory  element.  This  term  deflects  the  direction  of 
search  from  the  negative  gradient  direction.  The  modified 
conjugate  gradient  method  is  particularly  simple  to  program, 
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requires  little  additional  computation  and  storage  in  com¬ 
parison  with  the  steepest  descent  metHod,  and  in  general  con¬ 
verges  much  faster  than  the  steepest  descent  method. 

The  Davidon  method  The  Davidon  method  is  another 
popular,  second  generation  gradient  method.  It  was  developed 
by  Davidon  (47)  in  1959,  who  referred  to  the  method  as 
the  "variable  metric  method" .  The  Davidon  method  was  original¬ 
ly  developed  as  a  parameter  optimization  method.  Fletcher 
and  Powell  (48)  present  numerical  results,  and  proofs  of 
convergence  and  stability  for  the  finite  dimensional  case. 
Horwitz  and  Sarachik  (49),  and  Tokumaru  ej:  al^.  (50),  have 
recently  extended  Davidon 's  method  to  quadratic  functionals 
defined  on  a  real  separable  Hilbert  space?  in  (50)  numerical 
results  are  included  for  a  lumped  parameter  optimal  control 
problem. 

The  Davidon  method  like  the  conjugate  gradient  method 
is  based  on  the  quadratic  approximation.  In  the  quadratic 
case  let  A  denote  the  self-adjoint  operator  generating  the 
quadratic  functional,  and  in  the  non-linear  case  let  A 
denote  the  Hessian  operator;  then,  the  Davidon  method  deter¬ 
mines  a  direction  of  search 

s  =  -H  g  ,  (3.10) 

n  n^n 

where  :  U-*-U,  such  that  the  sequence  of  operators  {H^A} 
converge  to  the  identity  operator.  Thus,  the  sequence  of 


operators  {Hn}  converge  to  the  inverse  Hessian  A  .  This 
means  that  as  the  Davidon  iteration  progresses,  it  becomes 
similar  to  Newton's  second  order  method.  This  fact  accounts 
for  the  rapid  convergence  of  the  Davidon  method.  The 
Davidon  deflection  operator  is  determined  iteratively  as 
follows : 


Hn+lf  "  Hnf  +  <f'Pn>Pn-<^n><  ' 


(3.11) 


where  feU, 


Hq=I  (or  any  other  idempotent  operator) , 


(3.12) 


P  =  p  //<p  ,y  >  , 
*n  Fn,jrn 


(3.13) 


=  ' 


(3.14) 


%  ’  Hnyn ' 


(3.15) 


yn  =  (WV/Yn' 


(3.16) 


and  Yn  is  determined  by  the  inner  loop  iterative  such  that 


J^Un+YnSn^  -  J^Un+Ysn^  for  a11  Y— 0  * 


(3.17) 


The  Davidon  method  generates  an  outer  loop  iterator  given 


un+l  "  un  *  W 


(3.18) 


where  yn  and  sn  are  determined  from  Equations  3.11  through 
3.17.  The  Davidon  method  contains  memory  because  of  the 


Davidon  weighting  ooerator  H . 
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As  evident  from  Equation  3.11  the  storage  requirement 
of  the  Davidon  algorithm  increases  with  the  number  of 
iterations.  Thus,  even  on  the  large  modern  digital  computers 
storage  problems  arise,  if  a  large  number  of  iterations  are 
required  to  achieve  convergence.  This  drawback  of  the  Davidon 
method  has  lead  to  the  practice  of  restarting  the  iteration 
every  q  iterations.  This  modification  of  the  Davidon  method 
is  referred  to  as  the  Davidonlq]  method  (51).  By  restarting 
the  Davidon  method  every  q  iterations,  the  storage  require¬ 
ment  of  the  Davidon  method  is  at  least  bounded.  However, 
when  coupled  with  the  inherent  storage  problems  associated 
with  distributed  parameter  systems,  even  the  Davidon (q) 
method  presents  storage  problems. 

Inner  loop  iterators 

As  indicated  previously  the  inner  loop  iterator  deter¬ 
mines  the  amount  of  control  correction.  Consequently,  the 
convergence  of  the  inner  loop  iterator  directly  influences 
the  convergence  of  the  outer  loop  iterator.  In  fact,  when 
the  errors  due  to  the  various  discrete  approximations  made  in 
solving  the  problem  on  a  digital  computer  are  considered, 
it  is  the  inner  loop  iterator  which  determines  the  success 
or  failure  of  the  overall  iteration.  A  detailed  discussion 
of  this  fact  will  be  deferred  until  the  approximation  theory 
is  introduced. 

The  most  popular  inner  loop  iterators  are  those 
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which  perform  a  linear  minimization  in  the  direction  of 
search  s^.  In  theory  all  of  these  methods  converge  eventually 
to  the  same  fixed  point.  However,  in  practice  this  is  indeed 
not  the  case  because  of  gradient  errors.  The  analysis  of 
the  effects  of  gradient  errors  on  the  inner  loop  iterator  will 
also  be  given  in  the  next  chapter. 

The  three  most  popular  inner  loop  iterators  are  the 
following  s 

1.  Cubic  interpolation  based  on  functional  values  and 
directional  derivative  values  (52) . 

2.  Cubic  or  quadratic  interpolation  based  on  functional 
values  (52) . 

3.  Linear  interpolation  based  on  directional  derivative 
values,  i.e.,  regula  falsi  (53). 

When  there  are  no  errors  associated  with  either  the  calcula¬ 
tion  of  the  cost  index  J  or  the  gradient  g,  then  method  1 
above  is  cubically  convergent,  while  methods  2  and  3  are 
quadratically  convergent.  Thus  in  this  case  method  1  is  the 
superior  of  these  three  methods.  This  is  not  the  case  when 
discretization  errors  are  encountered.  In  fact  in  this  case, 
method  1  turns  out  to  be  the  least  efficient  of  these  three 


methods . 
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General  Results  for  Gradient  Methods 

The  following  results  are  listed  for  future  reference. 

The  cited  references  contain  neither  the  first  nor  the  only 
proof  available. 

Theorem  3.1.:  Let  U  be  a  real  separable  Hilbert  space  with 
inner  product  <’,•>  and  norm  ||»||  =  /<■,•>,  let  A  be  a  self- 
adjoint  operator  defined  on  U  such  that 

mA| | f | I2  <  <f » Af>  <  ma| | f | |2, 

and  let  J[.]  be  a  quadratic  functional  defined  on  U  and  given 
by 

J[u]  =  JQ  +  <c,u>  +  |<u,Au> 

with  minimum  at  u*=-A  c?  then,  the  steepest  descent  method 
(54) ,  the  conjugate  method  (original  or  modified)  (41)  ,  and 
the  Davidon  method  (50)  with  inner  loop  iterators  1,  2  or  3 
generate  a  sequence  {un)-+u*;  and  a  sequence  {g(un)}-*-0. 

Theorem  3.2.;  (32)  For  the  problem  defined  in  Theorem  3.1  the 

direction  of  search  vectors  s„  of  the  Davidon  method  and  the 

n 

conjugate  gradient  method  are  positive  scalar  multiples  of 
each  other. 

Remark :  The  proof  of  Theorem  3.2  presented  in  (32)  is  only 


valid  for  the  finite-dimensional  case;  however,  it  can  be 
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generalized  with  minor  extensions. 

The  above  two  theorems  are  particularly  significant  in 
this  study  and  will  be  used  repeatedly  in  what  follows. 

Theorem  3.1  demonstrates  that  at  least  theoretically  all 
three  of  these  popular  outer  loop  iterators  converge  to  the 
minimizing  element.  Theorem  3.2  presents  a  connection  between 
the  conjugate  gradient  method  and  the  Davidon  method.  Due 
to  the  generality  of  these  two  theorems,  they  certainly  apply 
to  distributed  parameter  optimal  control  problems.  The  proof 
of  convergence  of  these  methods  for  the  general  non-linear 
problem  is  not  a  closed  question.  However,  it  is  at  least 
intuitively  clear  that  if  the  functional  is  smooth  and 
convex,  then  these  methods  converge  to  the  solution.  This 
argument  is  founded  on  the  quadratic  nature  of  a  smooth  convex 
functional  near  the  minimum.  Theorem  3.1  does  not  ensure, 
however,  that  the  discretized  numerical  approximation  to  the 
problem  defined  in  Theorem  3.1  will  converge.  This  is  sig¬ 
nificant  because  it  is  the  discretized  version  of  this  problem 
that  is  actually  solved  by  the  digital  computer  algorithm. 

The  consideration  of  the  discretized  approximation  to  the 
optimization  problem  defined  in  Theorem  3.1  will  be  considered 
in  the  subsequent  chapter. 


i 
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CHAPTER  IV.  AN  APPROXIMATION  THEORY 
FOR  GRADIENT  METHODS 

Gradient  methods  are  iterative  procedures  and  are  there¬ 
fore  only  practical  when  programmed  on  a  high  speed  digital 
computer.  Thus  the  original  continuous  problem  is  actually 
replaced  by  a  discrete  problem.  In  the  process  of  trans¬ 
forming  the  continuous  problem  into  its  discrete  analog  a 
number  of  approximations  are  made  which  introduce  errors. 
Basically  two  types  of  approximations  are  involved: 

(1)  approximations  to  elements  of  a  Hilbert  space  (e.g.,  the 
approximation  of  functions  by  piecewise  polynomials) ;  (2) 
approximations  of  operators  defined  on  a  Hilbert  space 
(e.g. ,  approximations  of  differential  and  integral  operators 
by  finite-difference  and  summation  operators,  respectively). 
In  addition,  there  are  always  errors  encountered  which  are 
due  to  numerical  round-off. 

Until  recently  the  analysis  of  the  effects  of  these 
various  approximations  on  the  solution  of  optimization 
problems  has  been  neglected,  in  some  cases  with  justification 
and  in  others  without  justification.  For  example,  in  early 
studies  of  the  numerical  solutions  of  parameter  optimization 
problems  the  effects  of  round-off  were  considered  important. 
These  effects  have  been  studied  from  the  statistical  point  of 
view  (55) .  When  finite-difference  formulas  are  employed  in 
parameter  optimization  problems  to  calculate  the  gradient 
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vector,  then  the  truncation  errors  of  these  formulas  are 
encountered.  Stewart  (56)  approaches  this  problem,  in  the 
current  fashionable  manner,  by  attempting  to  eliminate  the 
truncation  error.  Previous  experience  (57)  by  this  author 
reveals  that  this  approach  is  not  an  answer,  but  only  a  cure 
and  only  an  approximate  cure  at  best.  During  the  study 
presented  in  (57) ,  the  need  for  an  analysis  of  the  effects  of 
gradient  errors  on  gradient  methods  became  evident. 

Recently  two  excellent  papers  (58,  59)  have  been  pub¬ 
lished  which  discuss  the  discretization  of  the  continuous 
lumped  parameter  optimal  control  problem.  These  papers  are 
concerned  with  demonstrating  the  convergence  of  the  solution 
of  the  discrete  problem  to  the  solution  of  the  continuous 
problem,  as  the  discretization  parameters  are  refined.  From 
a  theoretical  point  of  view  this  is  significant;  however,  in 
practice  discretization  is  finite  and  cannot  tend  to  zero. 

For  as  one  attempts  to  let  the  discretization  tend  to  zero 
difficulties  arise  immediately  in  connection  with  round-off 
errors.  As  a  simple  example  of  this  phenomenon,  consider  the 
approximation  of  a  derivative  by  a  finite-difference  formula 

(e.g.,  f '  (x)  =  lim  f  f  t  if  this  limiting  process  is 

h+0 

attempted  on  a  finite  word  length  digital  computer,  the  effects 
of  round-off  are  vivid. 

Fortunately,  in  the  case  of  lumped  parameter  optimal 
control  problems  the  effects  of  truncation  error  can  be  con¬ 
trolled.  This  is  largely  due  to  the  advanced  development  of 
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the  state  of  the  art  of  the  numerical  solution  of  ordinary 
differential  equations.  It  is  not  meant  to  imply,  however, 
that  the  effects  of  these  various  approximations  can  be 
overlooked  in  the  case  of  lumped  parameter  problems.  For 
example  a  common  practice  in  the  numerical  solution  of  lumped 
parameter  optimal  control  problems  is  to  use  a  fourth-order 
Runge-Kutta  integration  method  in  the  forward  integration  of 
the  state  system,  and  then  to  utilize  linear  interpolation 
(a  first-order  method)  tc  obtain  the  required  midpoint  values 
of  the  state  system  on  the  backwards  integration  of  the  co¬ 
state  system.  The  inconsistency  is  obvious.  The  estimates 
for  the  errors  induced  by  this  type  of  inconsistent  practice 
on  the  overall  solution  is  still  an  open  question. 

The  errors  of  the  discrete  approximations  involved  in  the 
solution  of  distributed  parameter  optimization  problems  on 
a  digital  computer  are  in  general  larger  than  in  lumped 
parameter  optimal  control  problems.  Hence,  the  effects  of 
discretization  errors  upon  gradient  methods  are  more 
pronounced  in  distributed  parameter  problems. 

The  computation  of  the  gradient  vector,  which  for 
gradient  methods  is  required  at  least  once  on  every  outer 
loop  iteration,  primarily  consists  of  the  forward  integration 
of  the  state  system  and  the  backwards  integration  of  the  co¬ 
state  system.  Thus  in  the  solution  of  distributed  parameter 
optimal  control  problems  by  gradient  methods, the  repetitive 
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computation  of  the  gradient  vector  constitutes  a  large 
percentage  of  the  total  computing  effort.  Hence,  if  high 
order  finite-difference  methods  are  employed  in  the  solution 
of  the  state  and  costate  systems,  then  excessively  long  com¬ 
puter  run  times  result.  If  lower  order  finite-difference 
methods  are  used  with  a  small  mesh  to  improve  the  accuracy, 
then  storage  problems  arise.  In  addition  for  distributed 
parameter  systems,  it  is  a  general  experience  (60)  that 
high  order  difference  formulas  are  usually  quite  disappointing 
in  practice.  This  is  in  contrast  to  the  situation  for  lumped 
parameter  systems,  where  methods  like  Runge-Kutta  achieve 
remarkable  accuracy  with  little  computing  effort.  The  reason 
for  this  difference  is  a  basic  one:  for  lumped  parameter 
systems  the  initial  conditions  are  elements  of  a  real 
Euclidean  vector  space,  and  thus  can  be  represented  to  a  high 
degree  of  accuracy  on  a  digital  computer,  with  the  error 
being  of  the  same  order  as  the  local  round-off  error;  how 
accurately  the  solution  at  t+AT  is  computed  then  depends  only 
upon  the  utilization  of  the  information  available;  for  dis¬ 
tributed  parameter  systems  the  initial  conditions  are  elements 

2 

of  a  function  space  (e.g.,  the  Hilbert  space  L  ),  and  thus 
cannot  be  represented  to  such  a  high  degree  of  accuracy  on 
a  digital  computer,  since  it  would  be  necessary  to  store  an 
infinite  number  of  quantities  at  t=tg ;  therefore,  in  com¬ 
puting  the  solution  at  t=tQ+At,  one  is  limited  by  a  lack  of 
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needed  information.  Consequently,  only  moderately  accurate 
finite-differences  methods  for  the  solution  of  the  state  and 
the  costate  systems  are  possible  with  gradient  methods.  It 
will  be  shown  that  errors  introduced  by  the  finite-difference 
solution  of  the  state  and  costate  systems  cause  errors  in 
the  computation  of  the  gradient  vector.  Therefore,  it  becomes 
necessary  to  consider  the  effects  of  gradient  errors  on  the 
class  of  gradient  methods. 

The  Effects  of  the  Discrete  Approximations 
on  the  Gradient  Vector 

As  indicated  in  Chapter  III  the  convergence  of  gradient 
methods  depends  strongly  on  the  gradient  of  the  cost  index. 
Therefore,  it  seems  reasonable  that  the  analysis  of  the  prop¬ 
erties  of  the  approximate  gradient  algorithms,  such  as,  con¬ 
vergence,  stability,  and  efficiency,  would  depend  essentially 
on  the  analysis  of  the  effects  of  gradient  errors. 

In  the  optimization  of  distributed  parameter  systems, 
all  of  the  approximations  (approximation  of  functions,  approxi¬ 
mation  of  operators  and  round-off)  are  present,  and  contribute 
to  gradient  errors.  In  the  investigation  of  these  approxi¬ 
mations  ,  several  results  considered  below  are  important. 

Let  the  set  of  admissible  controls  U  be  a  real 
separable  Hilbert  space,  and  let  U  be  a  set  generated  by  the 
application  of  the  discretizing  transformation  E(h)  to  the 
elements  of  U,  that  is 
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U=  {y :  y=E(h)u,  for  all  ueU)  ,  (4.1) 

where  the  discretizing  transformation  E(h)^  is  an  evaluation 
map  defined  on  the  nodes  N  of  a  net  ti  ,  and  h  is  the  discreti¬ 
zation  parameter  of  this  net  (the  definitions  of  an  evaluation 
map,  a  net,  and  the  nodes  of  a  net  are  given  in  Appendix  A). 


Example  4.1.;  Let  f(t)eC  for  all  teT,  where  T=(t:  a<U:<b}, 
and  let  the  nodes  be  the  set  N,  where  N={t^:  a-t^ , <t2 <. . . < 
tn=b,  t^+^=t^+h}.  The  discretizing  transformation  is  defined, 
in  this  case,  as 


E(h)f  = 


£<V 

♦ 

f<V 


(4.2) 


Let  U  be  a  function  space  generated  by  the  application 
of  an  interpolating  transformation  Q  to  the  elements  of  IX, 
that  is 

U  ={u:  u=Qy,  for  all  yeti)  •  (4.3) 

Example  4.2.;  Let  U  be  the  set  of  all  piecewise  quadratic 
polynomials  defined  on  the  set  °LL  determined  from  Example  4.1. 

Some  properties  of  the  sets  IX  and  U,  and  the  transformations 
E(h)  and  Q,  which  are  pertinent  to  this  study,  are  given  in 
the  following  lemmas.  Proofs  of  these  results  are  given  only 
in  those  cases  where  standard  references  are  not  available. 


^For  notational  simplicity,  the  explicit  dependence  of  E 
on  h  will  be  often  dropped,  i.e.,  EHE(h). 


Lemma  4.1.;  The  set  it  is  a  finite  dimensional  linear  space. 

Proof :  Follows  from  the  fact  that  an  evaluation  map  is  a 

functional  on  U. 

Lemma  4.2.:  The  set  U  of  piecewise  polynomials  is  a  finite¬ 
dimensional  subspace  of  U. 

Proof :  Clearly  UCU,  and  otu^+u2  is  a  piecewise  polynomial 
for  all  scalars  a  and  vectors  u1,u2eU?  hence,  U  is  a  sub¬ 
space  of  U. 

Remarks:  (i)  ll  is  complete?  hence,  with  the  addition  of 

an  inner  product  it  would  be  a  Euclidean  space 

(ii)  and  U  are  isomorphic. 

Lemma  4.3. :  The  interpolating  transformation  Q  is  a  linear 
transformation  from  *U  to  U. 

Proof :  This  lemma  follows  immediately  from  the  fact  that  the 
interpolation  formulas  defining  Q  are  linear  in  the  function 
values  on  the  nodes. 

Example  4.3.:  Consider  the  following  one-dimensional  piece- 
wise  quadratic  interpolation  formula 

Qf (t ) =-0 . 5  s (l-s)f (l-l)  +  (l-s)  (l+s)f  (I)+0.5  s(l+s)f(I+l)  , 

where  s^ft-tjJ/h,  and  f(N)  denotes  the  values  of  the  function 
on  the  nodes.  Thus 
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1  +  1 

Qf (t)  =  E  a. f (j) , 

j=I“l  11 

and  linearly  follows  immediately,  since 
1+1 

Q[ctf+Sg]=  E  a.  [etf  (j  )+8g  ( j)  ] 
j=I-l  11 

=  ctEa^ f ( j ) tgEa^g ( j )=aQ£+BQg. 

Thus,  even  though  interpolation  between  the  node  points 
might  be  quadratic,  the  operation  of  interpolation 
defined  on  the  discrete  space ‘It  is  a  linear  transformation. 

Lemma  4,4.;  For  the  transformations  E(h)  and  Q, 

(i)  E  }h)  does  not  exist,  and 

(ii)  Q  1  =  E  (h)  . 

Proof :  (i)  obvious 

(ii)  E  (h ) Gu-E (h) u=y  because  the  node  points  are  not 
altered  by  Q;  hence,  E(h)Q=I,  similarly 
QE  (h)  u=Qy=u;  hence  QE(h)=I. 

Remark:  On  the  subspace  U,  E(h)  has  an  inverse,  i.e., 

E  }h)=Q  on  U. 

Lemma  4.5.:  The  product  transformation  defined  by  P=QE(h) 
is  a  idempotent  operator  from  U  onto  the  finite-dimensional 
subspace  U. 
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2  .» 

Proof:  P  u=QEQEu=QEQu=QEu=Qvi=u ,  and 

2  2 
P  u=-u=Qu=QEu=Pu .  Thus  P  =P . 

In  the  actual  computational  process  on  a  digital  computer, 
the  discretization  is  accompanied  by  the  truncation  of  all 
but  a  finite  number  of  digits  (approximately  fourteen  digits 
in  double-precision) .  This  is  due  to  the  finite  word  length 
of  a  digital  computer.  Let  T  denote  the  truncation  operator, 
then  the  Hilbert  space  U  is  transformed  into  the  "digital" 
space  D  by  the  transformation  TE(h).  In  addition,  when  the 
pseudo  binary  operations  of  addition,  subtraction,  division, 
and  multiplication,  which  are  performed  by  the  digital  com¬ 
puter,  are  considered  then  this  "digital"  space  is  no  longer 
a  linear  space.  For  example,  because  of  numerical  round¬ 
off,  the  distributive  law  is  no  longer  exactly  satisfied. 
However,  if  stable  finite-difference  methods  and  double 
precision  arithmetic  are  utilized,  then  the  effects  of  round¬ 
off  become  secondary  to  the  other  error  sources.  Thus,  for 
the  problems  considered  in  this  work  IX  can  be  considered 
to  be  the  digital  space.  Hence,  the  discretization  process 
can  be  thought  of  as  a  projection  of  the  continuous  problem 
onto  the  finite-dimensional  subspace  U.  The  accuracy  of  the 
approximate  solution  then  depends  largely  on  the  dimension¬ 
ality  of  the  space  V,,  and  on  the  interpolation  formulas 
representing  Q.  The  relationships  between  the  spaces  U ,  , 

U,  and  D  are  illustrated  in  Figure  4.1. 


Figure  4.1.  The  discretization  process 


Many  theoretical  results  exist  for  optimization  problems 
in  a  function  space.  Unfortunately,  the  elements  of 
function  space  and  the  operators  defined  on  a  function  space 
cannot  be  exactly  represented  on  a  digital  computer;  hence, 
approximations  must  be  considered.  Lemma  4.4  insures  that 
the  approximate  optimization  problem  can  be  considered  to  be 
in  either  the  discrete  space  Vl  or  in  the  function  space  U. 
Admittedly,  the  solution  can  be  calculated  at  only  a  finite 
set  of  points;  however,  there  can  be  more  information  speci¬ 
fied  about  a  function  than  merely  its  values  at  a  finite  set 
of  points,  e.g.,  the  functions  are  polynomial,  differentiable, 
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etc.  .  Thus,  it  is  felt  that  the  elements  of  the  subspace  U 
give  a  more  complete  description  of  the  approximate  solution, 
and  solving  the  problem  in  this  subspace  is  more  in  the 
spirit  of  the  original  continuous  problem.  However,  regard¬ 
less  of  whether  the  approximate  solution  is  considered  to  be 
in  the  space  \L  or  in  the  space  U,  the  information  which  is 
lost  due  to  discretization  cannot  be  completely  regained 
(E  }h)  does  not  exist).  Therefore,  discretization  error  is 
caused  by  the  loss  of  information  in  the  initial  and 
boundary  conditions  of  the  state  system  and  in  the  initial 
control  due  to  the  transformation  E(h). 

The  exact  gradient  of  J  for  quadratic  programming 
problems  is  given  in  Equation  2.30  as 

g (u)  =  c+Au  .  (4.4) 

Along  with  this  exact  gradient  the  approximate  gradient 
given  by 

g (u)  =  c+Au  .  (4.5) 

is  considered.  The  first  question  to  be  answered  is  the 
following.  How  do  the  approximations  of  discretization  and 
truncation  (round-off  is  neglected)  effect  the  calculation 
of  the  approximate  gradient  g(u}7 

For  the  purpose  of  illustrating  how  each  of  these 
approximations  enter  into  the  calculation  of  g(u),  consider 
the  following  simple  problem: 


J 
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minimize 

J[ud<r,t>]  =  ||  |x(r,Tf) |  |2  +  ||  |  U<a  (r ,  t)  |  |  2  ,  (4.6) 

subject  to 


Sx  (r ,  t)  =  ud(r,t)  , 

(4.7) 

x(r  ,0)  =  xQ (r) , 

(4.8) 

xt(r,0)  =  0, 

(4.9) 

x(0,b)  =  0, 

(4.10) 

x  ( 1 ,  t )  =  0, 

(4.11) 

where 

S  = 


9  2 

— 2  ' 
3r 


T  -  R. 

■y  f  f  5 

|  |  x |  |  =  x* (r , t) drdt „ 

J0  J0 


and  Tf  =  4 ,  Rf  -  1.  From  Equations  2.23,  2.25,  and  2.30  the 
gradient  is  given  by 


-1  *  -1 

g(u)  =  ud(r,t)  +  [S  (Tf)]  [4>(Tf)xQ  (r)+S  (Tf )  ud  (r  ,  t)  ]  , 

(4.12) 

-1 

where  the  term  4>(Tf)xQ(r)  +  S  (Tf)ud(r,t)  represents  the 
forward  integration  of  the  state  system  from  tg  to  ,  and 
the  second  term  on  the  right  hand  side  of  Equation  4.12  is 
then  given  by  [S  ^Tf) )  (x(r,Tf)]  which  represents  the 

backwards  integration  of  the  costate  system.  This  explains 
the  reason  for  steps  (a)  and  (b)  in  the  gradient  algorithm 
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given  in  Chapter  III.  In  calculating  the  gradient  on  the 
computer  the  differential  operators  S  and  S*  are  actually 
replaced  by  finite-difference  formulas  which  are  truncated 
approximations  of  S  and  S*.  This  introduces  truncation 
errors.  Let  represent  the  finite-difference  approximations 
to  S,  and  let  ^Ct^EXq  denote  the  finite-difference  solution 
of  the  homogenous  state  system.  Then  the  discrete  approxi¬ 
mation  of  Equation  4.12  yields  the  approximate  gradient  (dis¬ 
cretized) 

-1  *  -1 

cj(y )  =Eg=Eud+ 1  c#(Tf)]  [4>(Tf)Ex0+  rf(Tf)Eudl  .  (4.13) 

In  Equation  4.13  discretization  errors  are  introduced  by  the 
approximation  of  the  initial  conditions  x0  by  ExQ  and  by  the 
approximation  of  the  control  ud  by  Eud<*  truncation  errors 
are  introduced  by  the  approximation  of  differential  operators 
by  truncated  finite-difference  operators,  which  are  reore- 
sented  by  ^  ^  and  <fi ,  respectively.  To  be  consistent  the  order 
of  the  interpolation  formulas,  represented  by  Q,  should  be  the 
same  as  the  order  of  the  finite-difference  method,  represented 
by  .  Little  additional  accuracy  can  be  obtained  by  making  the 
order  of  Q  higher  than  the  order  of  ,  and  if  the  order  of  Q 
is  lower  than  the  order  of  &  ,  then  interpolation  error  (see 
Appendix  A)  is  being  needlessly  introduced. 
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The  Effects  of  Gradient  Error 
on  Gradient  Methods 

Let  un+i=G(un'sn'Yn;  un_x'  •••»  ^n-in^  represent  the 

exact  gradient  iteration,  then  the  approximations  discussed 

above  yield  what  will  be  referred  to  as  the  approximate 

gradient  method  u  ,,=G(u  ,5  ,Y  ;  u  u  ).  The  follow- 

’  n+1  n  n  n  n-1  n-m 

ing  important  questions  arise:  (1)  when  there  are  gradient 
errors,  do  the  more  powerful  gradient  methods,  such  as: 
the  conjugate  gradient  method  and  the  Davidon  method, 
offer  advantages  or  disadvantages  over  the  simpler  gradient 
methods?;  (2)  given  that  the  convergence  of  the  exact 
gradient  method  is  assured,  under  what  conditions  (if  any)  will 
there  result  convergence  of  the  approximate  gradient  methods?; 
(3)  if  the  approximate  iteration  does  converge  numerically 
(in  general  (un>-*-u*/u* )  ,  at  what  step  should  the  iteration 
be  terminated  in  order  to  insure  a  reasonable  estimate  to  u*?; 
(3)  how  is  this  estimate  to  u*  made  and  how  suboptimal  is  u*? 
Before  answering  these  questions  some  additional  nomenclature 
and  definitions  have  to  be  introduced.  Let  ueU  denote  the 
interpolated  approximation  to  ueU,  where  from  Lemma  4.2  U  is 
a  finite-dimensional  subspace  of  U.  Let  denote  the  discrete 
approximation  to  J,  and  let  | | • | |  represent  the  norm  of  a 


vector . 
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Definition  4.1.:  (61)  If  there  exists  a  set  SNj={u:  un+1 
=un,n>N},  then  (un)  is  said  to  be  numerically  convergent 
and  SNc  is  said  to  be  the  state  of  numerical  convergence. 

Remark :  Numerical  convergence  is  different  from  the  standard 
concept  of  convergence.  This  difference  is  due  to  the  finite 
word  length  of  a  digital  computer. 

Definition  4.2.:  If  J[u*]=min  J[u],  then  the  optimal 

ueU 

control  error  | |e  | |  is  defined  as 
I  lej  hi  |u*-u*|  |  , 
where  u*=uNeSNc  . 

Definition  4.3.:  The  cost  functional  error  eT  is  defined  as 

- - — . —  J 

ej=|  J[u*]-£[E(h)u*l  |  . 

Since  J [u*]  cannot  be  computed,  the  cost  functional  error 
is  a  measure  of  the  suboptimality  of  the  approximate  solution. 

Gradient  errors  have  two  effects  on  the  gradient  itera¬ 
tion:  (1)  direction  of  search  errors  in  the  outer  loop 
iterator,  and  (2)  linear  minimization  errors  in  the  inner 
loop  iterator.  Until  more  accurate  finite-difference  methods 
are  developed,  it  appears  that  the  direction  of  search 
errors  must  be  tolerated.  However,  linear  minimization  errors 
can  be  avoided  when  the  effects  of  gradient  errors  on  this 


phase  of  the  iteration  is  understood. 

If  the  exact  gradient  of  the  cost  functional  J [ • ]  at 
un  is  given  by  gfu^),  then 

g(un)  =  g(ufi)  +  eg(un;h),  (4.14) 

where  g(un)eU  is  the  approximate  gradient  at  un ,  and 
eg(un;h)  is  the  gradient  error,  which  as  indicated  depends 
on  the  discretization  parameter  h  of  the  finite-difference 
method  used  in  computing  the  solutions  of  the  state  and  co¬ 
state  systems. 

Many  of  the  following  results  rely  heavily  on  the 
linearity  of  the  dynamical  system  and  on  the  quadratic  nature 
of  the  cost  index.  For  a  well-posed  linear  dynamical  system, 
there  exists  a  linear  transformation,  given  by  Equation  2.18, 
between  the  control  space  U  and  the  state  space  X.  In 
addition,  the  discretization  of  a  linear  continuous  dynamical 
system  results  in  a  linear  system  of  difference  equations  which 
when  solved  yields  a  linear  transformation  between  the  discrete 
space  XL  and  the  discrete  state  spaced  .  Consequently,  the 
truncation  error  (on  the  nodes),  which  is  the  difference  be¬ 
tween  these  solutions,  is  also  linear  in  the  control.  To  be 
more  specific,  let  g(u)  denote  the  discrete  approximation  of 
the  exact  gradient  g(u)  calculated  by  the  finite-difference 
solution  of  the  state  and  costate  systems,  then 
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9  (u)  =  <2  +  dy  =  <j+AE(h)u  .  (4.15) 

The  operator  &  is  linear  because  the  difference  equations 
resulting  from  the  approximation  of  the  linear  partial  dif¬ 
ferential  equations  are  linear. 

Lemma  4.6.;  If  the  dynamical  system  is  linear  and  if  J(*]  is 
a  quadratic  functional,  then  the  truncation  error  in  the 
gradient  t.g(u;h)  is  linear  in  u.  Specifically 

(u;h)  =  C  (h)  u  +  Cg, 

where  E,  (h)  =E  (h)  A-ftE  (h)  is  a  linear  operator  depending  on  the 
discretization  parameter  h,  and  8g=E(h)c-s  . 

Proof ;  The  truncation  error  in  the  gradient  is  given  by 

■«.q(u;h)  =  E(h)g(u)-$(u) 

=  E(h)  [c+Au]  -  [<8 +(XE  (h)  u  ] 

=  E(h)c-«  +  [E  (h)A-lE  (h)  )u  . 

=  «g  +  K  (h)u. 

Theorem  4.1.  :  If  the  dynamical  system  is  linear,  and  if  J  is 
a  quadratic  functional ,  then  the  approximate  gradient 
g(u)=Q^(u)  is  the  exact  gradient  (apart  from  round-off) 
of  the  quadratic  functional 

J  [u]-Jq  +  <c,u>+  i-<u,Au>, 


(4.16) 


59 


where 

c  =  Q  C  ,  and  A  =QdE(h) 

Proof :  g(u)=Q^(u) 

-Q  [»  +&E  (h)  u] 

=Q $  +  Q  &E  (h)  u  , 

which  by  inspection  is  the  gradient  of  J[u]. 


Remark:  The  inner  product  <•,•>  can  be  calculated  exactly 

(apart  from  round-off)  on  the  subspace  U. 

Theorem  4.1  is  an  important  result  because  it  implies 
that  even  though  g  is  not  the  gradient  of  J,  or 9  ?  g  is  the 
exact  gradient  of  J.  Therefore,  it  should  be  possible  to 
at  least  minimize  J.  Hopefully,  the  minimum  of  this 
approximate  problem  will  be  a  satisfactory  approximation  to 
the  true  solution. 

The  following  result  will  be  useful  in  the  analysis  of 
the  effects  of  gradient  errors  on  the  inner  loop  iterators. 


Lemma  4.7.:  Let  Yn  be  selected  such  that  J[un+Ynsnl 
<_  J[uri+ysn]  for  all  y_>  0 ,  where  sn  is  the  direction  of  search. 
Then  yn  minimizes  J  along  the  half-ray  Un+Ysn,  and  is  given 
by 


Y 


n 


<g  ,s  > 

n 


<s  (As  > 
n  n 


(4.17) 
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Proof:  Substituting  un+ysn  into  Equation  2.21  yields, 


JtuB+l]-J|u01*Y<,n'V  +  J  <sn'Asn>  ' 


The  first  derivative  of  the  cost  change  in  the  direction 
with  respect  to  y  is 


“5*"  ( J  £  u  ,,]-J[u  ])  -  <g  ,s  >+  y<s  ,As  >. 
dy  n+1  n  *n  n  n  n 

Setting  the  above  equation  equal  to  zero  and  solving  for 
y  yields 

<g„ ,s> 
y  —  _  n  n 
n  <s  ,As  > 

n  n 

The  second  derivative  shows  that  Yn  yields  a  minimum,  i.e.. 


7T<J[un+l1"Jtun1)  =  <sn'Asn>  *  TOA  •  *  sn  I  I  > 
dy 

By  applying  Lemma  4.7  it  is  easily  shown  that 


Yn  = 


<gn'V 


(4.18) 


and 


Yn  =  ~ 


<g  ,s  > 
3n  n 

<s  ,As  > 
n  n 


(4.19) 


denote  the  control  correction  parameters  defined  by  Equation 
3.2  in  the  direction  for  J  and  J,  respectively.  For  dis¬ 
tributed  parameter  systems,  the  operators  A  and  A  are, 
respectively,  multiple  integral  and  multiple  summation 
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operators.  As  a  consequence.  Equations  4.18  and  4.19  are  not 

generally  used  in  practice.  Instead  of  Equation  4.17,  the 

following  methods  are  usually  utilized  in  the  inner  loop  to 

determine  y„ : 

n 

1.  Cubic  interpolation  based  on  functional  and 
directional  derivative  information. 

2.  Quadratic  interpolation  based  on  functional  infor¬ 
mation  . 

3.  Linear  gradient  interpolation  based  on  directional 
derivative  information  (i.e.,  requla  faJ  si ) . 

In  theory  these  three  methods  yield  the  same  result. 
However,  method  1  is  generally  considered  to  be  superior  to 
the  other  two  methods  because  of  its  rapid  convergence 
properties.  When  there  exist  gradient  errors  of  sufficient 
magnitude,  this  is  not  the  case.  In  fact  numerical  results 
indicate  that  when  there  are  gradient  errors  then  method  1  is 
the  least  efficient  of  these  three  methods.  The  convergence 
of  the  inner  loop  iterator  is  essential  to  the  convergence 
of  the  outer'  loop  iterator?  hence,  the  effects  of  gradient 
errors  on  each  of  these  three  inner  loop  methods  will  be 
discussed. 

Method  1,  Cubic  interpolation  based  on  functional  and 
directional  derivative  information 

A  general  description  of  this  method  is  presented  in 
(52).  This  method  is  the  most  sensitive  of  these  three 
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methods  to  gradient  error  because  it  requires  a  close  corre¬ 
lation  between  the  gradient  and  the  functional.  Reliance  on 
both  types  of  information  (gradient  and  functional  values) 
can  cause  difficulties  if  the  relative  magnitude  of  the 
gradient  error  is  large.  One  reason  for  the  difficulties  en¬ 
countered  by  this  method  is  that  it  brackets  the  minimum  in  the 

direction  of  search  (i.e.,  the  iterator  determines  two 
12  12 

scalars  Yn  and  vn  such  that  Yn£Yn£Yn)  by  determining  when 
the  directional  derivative 


dJ  .  .  ,  . 

3—  =  <g  (u  +ys  ) 
dy  n  n 


s  > 
n 


(4.20) 


changes  sign,  i.e.,  from  negative  to  positive.  Unfortunately 
due  to  gradient  errors,  this  method  generally  does  not  bracket 
the  minimum.  This  is  illustrated  in  Figure  4.2. 
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As  is  indicated  pictorially  in  Figure  4.2  the  approximate 

directional  derivative  at  u  +yns  can  be  positive  when 

non 

actually  the  exact  directional  derivative  is  negative.  Thus, 
based  on  the  approximate  directional  derivative  this  method 
would  predict  that  the  minimum  is  in  the  interval  f  0  ,  Yq ) , 
which  is  obviously  incorrect.  This  difficulty  can  be 
corrected  by  employing  another  procedure  to  bracket  the 
minimum.  However,  this  would  only  be  a  minor  curri  since  the 
interpolation  formulas,  used  by  this  method,  are  also  based 
on  both  types  of  information.  Hence,  when  there  exist  con¬ 
siderable  gradient  errors,  this  inner  loop  iterator  is  not 
recommended. 

Method  2.  Quadratic  interpolation  based  on  functional  values 
The  fundamental  idea  underlying  this  method  is  the  obser¬ 
vation  that  the  cost  index  is  nearly  quadratic  in  y  in  the 

direction  of  search  s  near  the  minimum.  If  for  fixed  u  and 

n  n 

sn 

^  [Y]=y  [un+ysn]=  aQ+a1Y+a2Y2,  (4.21) 

then  by  computing  ^[y]  for  i=l,2,3,  a  system  of  equa¬ 

tions  is  generated  from  which  the  coefficients  of  the  assumed 
polynomial  can  be  obtained.  The  estimate  for  y  is  obtained 
from  the  equation 


^7^  =  ai+2a2Y  -  0 


(4.22) 
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from  which 

Yn  "  "V2a2  '  (‘,'23) 

If  ^  is  quadratic  then  this  method  determines  Yn  in  one 
iteration;  however,  if  p-  is  not  quadratic,  then  additional 
logic  is  required  to  determine  y  .  The  important  feature  of 
this  method  is  that  it  does  not  depend  on  the  approximate 
gradient  vector.  However,  since  g  is  not  the  gradient  of 
(g  is  the  gradient  of  J)  ,  the  directional  derivative  at  the 
minimum  of  ^  in  the  direction  sn  does  not  necessarily  vanish; 
hence,  the  subsequent  direction  of  search  is  not  a  conjugate 
direction  and  the  method  of  expanding  subspaces  does  not 
apply.  Therefore,  if  this  inner  loop  iterator  is  used  in 
conjunction  with  a  conjugate  direction  method,  then  rapid 
convergence  cannot  be  proven.  Once  again,  it  is  the  in¬ 
consistency  between  the  gradient  and  the  functional  which 
causes  the  difficulties.  Numerical  experience  indicates  that 
in  the  presence  of  gradient  errors  this  inner  loop  iterator 
combined  with  a  conjugate  direction  method  produces  slow 
convergence  near  the  minimum.  The  slow  convergence  near  the 
minimum  is  caused  by  gradient  errors  which  are  more  dominant 
near  the  minimum  (both  the  exact  and  the  approximate  gradients 
become  smaller,  in  the  norm,  near  the  minimum) .  The  main 
advantage  of  this  inner  loop  method  is  that  it  attempts  to 
minimize^  ,  which  may  be  a  better  approximation  to  J  than  is 
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J.  Methods  for  terminating  this  iteration  will  be  presented 
later,  since  the  standard  test  on  the  norm  of  the  gradient 
no  longer  applies  in  this  case. 

Method  3.  Linear  interpolation  of  the  approximate  directional 
derivative  (regula  falsi ) 

Regula  falsi  has  not  received  widespread  application  as 
an  inner  loop  iterator;  however,  it  is  probably  the  oldest 
of  these  three  methods.  This  method  is  similar  to  Newton's 
method  in  that  it  determines  the  zero  of  the  gradient  rather 
than  the  minimum  of  the  functional.  When  regula  falsi  is 
employed  as  an  inner  loop  iterator,  it  determines  the  zero 
of  the  approximate  directional  derivative;  hence,  the  minimum 
of  J  in  the  direction  of  search.  Like  method  2  this  procedure 
does  not  mix  the  gradient  and  functional  information. 

Referring  to  Figure  4.3  the  interpolation  procedure  is  as 
follows;  assume 


dJ 

dy 


ao+aiY' 


then 


dJ. 

dy  I 
1  y=a 


ao+aia ' 


and 


dJi 

dY'y=8 


VaiB 


(4.24) 


(4.25) 


(4.26) 


The  above  relations  yield  two  equations  in  the  unknowns 
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Lemma  4.8.;  If  J  is  a  quadratic  functional,  then  the  requla 

falsi  method  determines  y_  such  that 
-  n 

J(un+Ynsnl  £  Jtu^+ys^]  for  a11  Y— °* 


Proof ;  For  simplicity  let  ct=0  ,  then  Equation  4.27  yields 


vn  ■  l\'V"/l<VV‘<SIW'Vl 


(<gr,,VB)/(<gn,sn>-<s+MOn+Bsn),sn>) 


<<VVs,/(<9n'Sn>_<V6*Sn'V) 


-<q  ,S  >/<s  ,As  >, 
^n '  n  n '  n  ' 


and  the  proof  then  follows  from  Lemma  4.7. 

Thus,  the  requla  falsi  method  is  a  numerical  procedure 
for  obtaining  the  result  of  Equation  4.19. 

Theorem  4,2.:  If  A  is  a  self-adjoint  operator,  if  the  con¬ 
ditions  of  Theorem  4.1  are  satisfied,  and  if  the  requla 
falsi  method  is  used  in  the  inner  loop,  then  the  gradient 

iteration  u  ,.=G(u  ,s  ,y  ;  u  ,,...,{1  )  generates  a 

n+1  n  n'’n  n-1  n-m  ^ 

sequence  {un}  which  numerically  converges  to  the  approximate 
minimizing  element  in  U,  specifically 

J [u* ]  =  min  J  [u]  . 
ueU 
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Remark :  The  standard  convergence  proofs  for  these  three 
gradient  methods  can  be  applied  to  Theorem  4.2  (refer  to  the 
references  given  in  Chapter  III). 

Corollary  4.1.:  If  the  conditions  of  Theorem  4.2  are  satis¬ 
fied  then  both  the  conjugate  gradient  method  and  the  Davidon 
method  converge  in  a  finite  number  of  iterations  to  the 
minimum  of  J. 

Proof:  The  proof  follows  from  the  fact  that  U  is  a  finite- 
dimensional  subspace  of  U,  and  from  the  application  of  the 
results  contained  in  (41)  and  (50). 

Theorem  4.2  is  significant,  since  it  implies  that  {u^} 
generated  by  G  minimizes  J  and  not  J  nor  ^  .  Corollary  4.1  is 
important,  since  it  insures  convergence  of  the  conjugate 
gradient  method  and  the  Davidon  method  in  a  finite  number 
of  iterations. 


Error  Estimates 

Since  {u  }  does  not  minimize  J,  it  is  desirable  to  com- 
n 

pute  the  optimal  control  error.  This,  however,  is  impossible 
without  prior  knowledge  of  the  solution  u* .  Nevertheless, 
estimates  of  the  optimal  control  error  may  be  obtained  by 
means  of  the  results  given  below. 
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Lemma  4.9.:  Let  VuJ  denote  the  gradient  of  a  positive 
definite  quadratic  functional  J  defined  on  a  real  Hilbert 
space  U,  and  generated  by  a  self-adjoint  operator  A,  then: 

2  2 

(i)  Jg (u)=] | tfuJ ||  3 | | g (u) | |  is  also  a  quadratic 
functional  with  Hessian  2AA, 

(ii)  The  set  defined  by  Sg={u:  | | g (u) | | 2=*c}  is  a 
hyperellipsoid  in  the  Hilbert  space  U.  and, 

(iii)  If  J  [u*]=min  J  [ u ] ,  then  u*=u*. 

^  9  ueU  ^  " 

Proof:  (i)  J  [u]  =  |  |  g(u)  |  |  =<g  ,g>  =  <c+Au,  c+Au> 

*  <c,c>+2<c,Au>+<Au,Au>  . 

Since  A=A* ,  it  follows  that  <Au,Au>  =  <u,A*Au>=<u,AAu> . 

Thus,  J  [u]=<c,c>+2<c,Au>+  y<u,2AAu>,  and  the  Hessian  is  then 
2AA. 

(ii)  Jg  is  quadratic  from  (i) . 

(iii)  7  J  =2Ac+2AAu;  hence,  V  J  =  0  implies  that  AAu=-Ac 

u  y  \A  y 

and  that  u*=-A  ^Ac=-A  c=u* . 

g 

Lemma  4.10.:  Consider  the  translation  defined  by  Q=u-u*, 
where  3 [u] -J [u] .  Then: 

(i)  J[u)=J(u*]+  j<Q,AQ>. 

(ii)  7aJ[Q]=VuJ[u]  . 


Proof :  (i)  J [ Q] = J [ u] =J [u+u* ] 


=  Jq+<C/U+u*>  +  y<u+u* ,A (G+u*) > . 

Expanding  and  using  the  representation  of  the  solution 

u*  =  -A “i, 

yields 

Jtu]-J0-  ~<c ,A~c>  + 

Now  it  is  easily  shown  that 

J (u*]=J0+<cfu*>  +  j<U*,Au*> 

=J0-<c,A_c>  +  j<A~c,AA"c> 

-  JQ  -  i-<C,A_i>  , 

and  thus , 

J[u]=J[u*]  +  |-<u,Au>. 

(ii)  V^j*=Au=A  (u-u*)  =Au-Au*=AU-A  (-A  c) 


=Au+c=V  J . 
u 


Theorem  4.3.:  The  vectors  defined  by  | |uu| 

2  2 
and  | | u, | |  =min | | Q | |  are  eigenvectors  of  A 

1  ueS~ 

9 

2  2 

eigenvalues  M.  and  m. ,  respectively. 


2=max  |  |  0 1  |  *■ 

GeS* 

(i.e.,  AA)  with 
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Proof ;  By  using  a  Lagrange  multiplier  X  the  proof  of  this 
theorem  can  be  formulated  as  an  optimization  problem,  i.e., 
extremize  | |u| |  subject  to  the  constraint  S^.  This 
constrained  problem  can  be  reformulated  as  an  unconstrained 
problem  by  considering  the  functional 

f  [Q,X]  =  |  |  Q  [  |  2  +  X  (c-1  |  At3 1  |2)  . 

Then,  the  gradient  of  f  is  given  by 


Vf[G,X]  = 


af 

3Q 

3  f 

2u-2XA  G 


where  differentiation  is  in  the  Fredhet  sense. 

By  setting  \7f=0,  one  obtains 

.2  A  1  A 
A  U  =  -^  U  . 

2 

Therefore,  the  vectors  u  and  u,  which  extremize  ||u|| 

u  1 

2 

subject  to  the  constraint  S~  are  eigenvectors  of  A'.  Since, 

2 

and  m  are  spectral  bounds  for  A,  it  follows  that 

2  2  2 
and  mA  are  spectral  bounds  for  A  (since,  A  e=AAe=AXe=XAe 

=XXe=X2e) . 

Theorem  4.4.:  Let  |  jgfu^)! ]  denote  the  exact  normed  gradient 
squared  of  J [ • ]  at  u^.  Then, 


I^V 


u*-u* 


g(v 
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Proof :  Let  u=u^-u*  and  note  that  since  A  is  self-adjoint,  so  is 

AA.  By  Lemma  4.10,  I  I g (u^  |  | 2= j  | g (Q)  |  | 2= | | aQ | | 2=<u , A2fl> .  Thus 
from  Theorem  4.3, 

|fl| |2  <  <Q,AA0>  <  M2|  | Q | |2, 

and 


which  yields  the  desired  result  after  taking  the  square  root 
of 


g(V 


m: 


% 


-u* 


I  Ig^)  I  I 

- 2 - 

m„ 


From  Theorem  4.2,  {un>  minimizes  Jj  hence,  (g(un)}-*-0.  This 
resuics  in  a  method  by  means  of  which  ]  jg(u^) | |  can  be  esti¬ 
mated  . 


Theorem  4.5.:  In  the  state  of  numerical  convergence  (see 
Definition  4.1),  the  approximate  gradient  methods  re¬ 
sented  by  un+1=G((in,Sn,Yn;  ,  .  .  ,  u^)  with 

Y  =-<g  ,s  >/<s  , As  >,  insure  that  g(u.1)=0.  Thus, 

'n^n  n  n  n  ^  N 

I  Ig^)  l|  =  |  leg  (*\|<‘h)  |  |  • 

Proof :  From  Equation  4.14  g (un) (un) +6g(un;h) .  Now  {unl 

minimizes  J  (Theorem  4.2)  which  implies  that  {g(u  )}-*-0  with  n. 
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Thus  g  (&n)=eg  (ur  ;h)  for  all  n>_W. 

Combining  Theorems  4.4  and  4.5  an  estimate  of  the  optimal 
control  error  is  obtained  by  means  of  the  following 
Corollary . 

t  h 

Corollary  4.2:  Let  egfu^ph)  be  the  gradient  error  at  the  N 

iteration  of  Q  ,  ,  =G  [ii ,9  ;  fi  1  .  Then  the 

n+i  n  n  n  n-J.  n-m 

estimate 

|  |eg(uN;h)  |  |/Ma  <  ||u*-u*||  <  I  |eg  (uN ;h)  |  | /nA# 
is  obtained 

Proof :  The  proof  follows  immediately  from  Theorems  4.4  and  4.5. 

Unfortunately,  only  the  projection  of  the  gradient  error  on 
the  subspace  of  interpolating  functions  can  be  obtained  on 
the  computer.  Let  PegeU  be  the  projection  of  egeU  on  the 
subspace  0,  and  let  U1  denote  the  annihilator  of  U.  From 
the  Projection  Theorem,  (refer  to  Appendix  A)  the  gradient 
error  is  given  by 

eg(un;h)  =  Peg  +  Y,  (4.28) 

where  YeU'L.  From  the  triangle  inequality  it  follows  that 

I |eg(un;h)  | |  <  |  j  Peg |  |  +  I  M |  ,  (4.29) 


and  the  estimate 
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I  |u*-u*j  |  <  ( [  |  P  egl  I  +  I  1*1  l>/"#A 


(4.30) 


holds . 

From  tne  Projection  Theorem,  U+0X=U,  and  thus,  as  the 
dimensionality  of  0  increases  (refinement  of  the  discreti¬ 
zation)  the  quantity  ||y||+0.  Equation  4.30  yields  a 
practical  method  by  means  of  which  the  optimal  control  error 
can  be  estimated. 


If  method  2  is  utilized  in  the  inner  loop  iteration  then 


{g(un)^°/  and  thus,  another  method  for  estimating 
| Igfu^) | |  is  requ  red.  Let  Yn  be  the  control  correction 
parameter  for  in  the  direction  §n .  Since,  method  2 

minimizes  <j)[u]  in  this  direction,  Vn  is  then  given  by 


<QEgn,QEin> 

<QEsn,oaEin> 


(4.31) 


If  the  steepest  descent  method  is  employed,  or  if  the  other 
gradient  methods  are  restarted  each  time  an  up-hill  direction 
of  search  occurs,  then  y  -*■{).  As  noted  before  this  inadvertent¬ 
ly  creates  a  fixed  point  for  the  iteration,  without  causing 
the  gradient  to  vanish.  In  addition,  since  eventually 
becomes  small,  slow  convergence  results.  This  proper '-y  has 
been  noted  in  numerical  results  (57).  Termination  of  the 
iteration  occurs  when 


<QEgn,QEgn>  =  0. 


(4.32) 
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This  implies  that 


<QE(gn+eg);  QEgn>  =  G,  (4.33) 

and  since  QEgn=gn,  Equation  4.33  yields 

<g  ,g  >  =  -<QEe^  ,g  >  .  (4.34) 

g  ’n 

Now  consider  the  relations, 

!lQEgnM2  =  <QEgn,QEgn>  =  <QE (gn+eg) ,QE  (gn+eg) > 

-  <gn+QEeg,gn+QEeg> 

=  llgnl I2  +2<gn ,QEeg>+ | |0Eeg| |2.  (4.35) 

Substitution  of  Equation  4.34  into  Equation  4.35  yields 

| |QEgn| !2=||QEeg| |2-| |gnl |2.  (4.36) 

Using  the  Projection  Theorem  and  the  triangle  inequality  one 
obtains  the  estimate 


IgJ  I  1  I  |0Egn  I  |  +  |  |Y|  |  ,  YeU-1  . 


(4.37) 


Thus,  the  estimate 


,,  MlQMjr-llgJlV  +  ||Y| 

«*-»*  I  I  i  - - - 


m. 


(4.38) 


is  obtained  for  the  case  where  method  2  is  used  as  the 
inner  loop  iterator. 

An  estimate  of  the  cost  functional  error  can  also  be 
given  in  terms  of  the  gradient  error  and  the  spectral  bounds. 
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Theorem  4.6;  Let  J  be  a  quadratic  functional  defined  on  U 
generated  by  a  self-adjoint  operator  A  and  by  an  .inner 
product  .  Let  J  be  the  approximation  of  J 

defined  on  the  subspace  0  generated  by  A  and  by  the  inner 
product  { • , • )  .  Then 


Proof :  Let 

J[u*]=JQ+<c,u*>  +  i<u* ,Au*> 

and 

J [ u* ] — J  0+ ( c » u* )  +  i(u*,Au*)  . 

The  relation 

“<u*  ,Au*>  =  j<g(u*),u*>  -  ^-<u*,c>, 
implies  that 

J[u*]=Jq+  i<c,u*>  +  ~<u*,g(u*)>  . 

=JQ  +  i<c+ec ,u*+ (u*-u* ) >  +  i<u*,g(u*)>. 

Expanding  the  above  equation,  taking  into  account  the  errors 
due  to  the  approximate  inner  products,  and  taking  the  absolute 
value,  yields  the  estimate 
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J [u* ]-J [u*] 1 

<  1  e  t 

Jo 

|  +  J  | <c,u*-u*> | 

+  |l 

<ec,u*> 

-  J 1 

<e  ,u* 

-u*>  1  +  he  1|  + 

i  |  e 

2  | 

2  1 

c' 

'  2 1  s  1 

P 

2  s 

P 

♦y  I 

<u*-u 

* , g (u* ) > |  +  ~\  <u* 

,ey>  1 

• 

Using  Schwarz's  inequality  and  Corollary  4.2,  and  by  grouping 
terms  properly  the  desired  estimate  is  obtained. 

In  practice  <^[Eu]  is  computed  rather  than  J[u];  however, 

|J[u*]-J[Eu*]  |  <  |  J[u*]-J[u*]  |  +  ( e~  |  ,  (4.39) 

where 

e j  =  3 [ u ] - J [ Eu ] . 

The  error  can  be  eliminated  (apart  from  round-off)  by 
the  proper  selection  of  the  quaradure  formulas.  For  example, 
if  piecewise  quadratic  interpolation  is  employed  to  determine 
function  values  between  the  node  points,  then  the  use  of 
Simpson's  quadrature  formula  over  each  partition  insures  that 
ej=°. 

Determination  of  the  Parameters  in  the 
Error  Estimates 

The  estimates  of  the  optimal  control  error  and  the  cost 
functional  error  are  based  on  the  gradient  errors  and  on 
the  spectral  bounds  mA  and  .  Hence,  in  order  to  use  these 
estimates  methods  for  obtaining  these  quantities  are  required. 
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Gradient  error 

At  the  present  time  two  methods  have  been  utilized  for 
estimating  | | eg | | :  (1)  error  bounds  in  terms  of  higher 
order  difference,  and  (2)  asymptotic  extrapolation.  Since 
the  first  method  is  problem  dependent  and  also  conservative 
only  the  second  method  will  be  discussed  here. 

Asymptotic  extrapolation  is  an  attempt  to  actually  com¬ 
pute  the  gradient  error.  It  is  based  on  the  fact  that  if  the 
approximation  to  the  gradient  is  of  order  p,  then 

g(uN)  =  gfu^h)  +  hPeg(uN)  +  0(uN;hP+1),  (4.40) 

where  eg  is  defined  as  the  magnified  error  function. 

Solving  for  the  gradient  g (u N)  by  using  stepsizes  of  h  and 
qh,  respectively,  0<q<l,  two  equations  in  g(uN)  and  eg(uN) 
result,  which  when  solved  yield 

||eg(uN:h)||  -  [1/ (l-hP)  ]  |  |  g  (^,-qh) -g  (uN;h)  |  |  .  (4.41) 

In  view  of  Theorem  4.2,  g(uN,h)=0,  and  thus,  the  estimate 

| |eg(uN;h) | |  =  (l/(l-hp)] | |g(uN;gh) i |  (4.42) 

for  the  norm  of  the  gradient  error  is  obtained. 

For  the  class  of  problems  considered  the  lower  spectral 
bound  can  be  determined  analytically.  The  general  form 
of  the  Hessian  operator  for  this  class  of  problems  is 


A  =  al  +  T*T  . 


(4.43) 
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By  Lemma  A-l,  T*T  is  self-adjoint  with  a  lower  spectral  bound 
of  zero.  In  addition  from  Lemma  A-2,  a+T*T  is  also  self- 
adjoint  with  mA=a.  This  analytical  result  is  certainly  an 
advantage  in  performing  the  error  analysis  on  quadratic 
programming  problems.  However,  in  many  cases  it  is  not 
possible  to  determine  analytically  either  the  Hessian  operator 
A,  or  its  spectral  bounds.  For  example,  in  non-linear 
problems  the  operator  A  does  not  appear.  However,  if  the 
quadratic  approximation  is  valid  near  the  minimum  of  a  non¬ 
quadratic  functional  (which  is  at  least  convex),  then  Davidon's 
method  presents  a  numerical  procedure  by  means  of  which  an 
approximate  Hessian  and  its  spectral  bounds  can  be  obtained. 

Theorem  5.7.;  (50)  Let  A  be  a  Hessian  operator  defined  on  a 

real  separable  Hilbert  space  U.  Then  there  exists  a  subspace 
UCU  such  that 

n  -i 

Hu  =  A  u  as  n-*°°, 

for  all  usU,  where  {Hn}  is  the  Davidon  deflection  operator 
defined  by  Equations  3.11  through  3.16. 

Corollary  4.3.:  Let  | | Hn+u-Hu)  j <e  for  all  n>N,  and  let 
rru!  and  denote  the  smallest  and  the  largest  eigenvalues  of 

H  H 

N 

H  ,  respectively.  Then 
mA  *  1/mJ 
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and 

ma  *  1/ml  • 

Proof:  HNA  ~  I  implies  that  =  1  = 

Unfortunately,  Hn  rather  than  Hn  is  obtained  on  the  computer, 

~  n  ~  i 

where  HnA-+T .  Thus,  the  previous  error  estimates  are  valid 

N 

only  in  those  cases  where  mA£m^=l/Mg  .  If  m^  decreases  under 
a  refinement  of  the  mesh,  then  one  might  possibly  consider 
using  m^  in  the  error  analysis.  However,  this  would  probably 
produce  a  more  conservative  error  estimate. 

Geometric  Interpretation  of  the 
Error  Bounds 

Due  to  gradient  errors,  one  should  not  expect  to  obtain 
the  true  solution  of  an  optimization  problem  when  gradient 
methods  are  employed.  Thus,  estimation  of  the  errors 
become  an  important  part  of  the  solution.  The  error  esti¬ 
mates  presented  in  this  chapter  rely  heavily  on  the  quadratic 
properties  of  the  cost  index.  These  estimates  are  based  upon 
the  following  geometrical  considerations.  Assume  a  gradient 

method  is  employed  which  ensures  the  vanishina  of  the  approxi- 

t  H 

mate  gradient.  Then  the  true  gradient  at  the  N  iteration 
becomes  equal  to  the  gradient  error.  Obviously,  if  the 
gradient  error  can  be  calculated,  then  it  is  possible  to 

^Assuming  that  method  3  is  used  in  the  inner  loop. 
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cent  inm*  the  i  to rat  ion.  However,  in  general  it  is  much 

easier  to  estimate  the  norm  of  the  gradient  error  than  the 

gradient  error  itself.  If  only  the  norm  of  the  qradient  error 

is  known,  then  it  is  impossible  to  continue  the  iteration 

because  of  the  lack  of  a  direction  in  which  to  proceed. 

Now  assume  that  |  |e  |  |  can  be  computed.  From  Lemma  4.9,  the 

q 

2 

set  S  ={u:  | | g | |  =c}  is  a  hyperellipsoid,  which  if 

orientated  properly  would  have  its  center  at  u*.  However, 
since  only  | |g| |  can  be  estimated,  it  is  not  possible  to 
determine  this  direction.  Nevertheless,  the  true  solution  u* 
must  be  contained  in  a  hypersphere  which  has  a  radius  equal 
to  the  semi-major  axis  of  the  constant  gradient  (at  uN) 
hyperellipsoids.  These  considerations  are  illusted  in  Figure 
4.4. 


LOCUS  OF  THE  CENTERS 
OF  THE  CONSTANT 


Figure  4.4.  Geometrical  interpretation  of  the  optimal 


control  error 


If  the  ratio  is  large,  then  these  error  estimates  be¬ 

come  conservative.  In  theory  an  inner  hypersphere  can  be 
constructed  based  on  the  minor  axis  of  these  hyperellipsoids. 
However,  the  methods  used  in  estimating  the  parameters  in  thes 
error  estimates  makes  these  lower  bounds  questionable.  It  is 
worthwhile  to  note  that  the  constant  J  contours  are  in  general 
translated  and  deformed  because  of  gradient  error.  This  is 
illustrated  in  Figure  4.5.  The  fact  that  the  approximate 
gradient  algorithms  only  solve  the  problem  in  a  subspace  U 
of  U  is  illustrated  in  Figure  4.6. 


Figure  4.5.  Constant  cost  contours 


B3 


Figure  4.6.  Minimization  on  a  subspace 
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CHAPTER  V.  NUMERICAL  RESULTS 

All  computations  reported  in  this  dissertation  were  per¬ 
formed  on  the  IBM  360  Model  65  digital  computer  usinq  the 
Fortran  IV  language  with  double-precision  arithmetic.  Compu¬ 
tation  times  quoted  are  the  time  used  by  the  Central  Proces¬ 
sing  Unit  (CPU)  during  program  execution.  Although  the 
Central  Processing  Unit  time  is  the  best  measure  of  the 
computing  effort  required,  it  is  not  precisely  reproducible 
on  identical  programs  due  to  the  multi-programming  feature  of 
the  system.  Storage  requirements  reported  are  in  terms  of 
array  area  used  in  BYTES,  which  does  not  include  object  code 
storage  requirements. 

The  solution  of  the  state  and  costate  partial  differential 
equations  were  performed  with  a  standard  second  order 
symmetric  finite-difference  algorithm  (62).  The  multiple 
quadrature  algorithm  used  in  computing  the  cost  functional 
and  the  inner  products  was  based  on  the  Gauss-Legendre  ten 
point  quadrature  formula  (53) .  Piecewise  continuous  quadratic 
polynomials  were  used  to  obtain  function  values  between  the 
node  points.  All  three  types  of  inner  loop  iterators 
described  in  Chapter  IV  were  employed;  however,  only  the 
results  obtained  with  method  3  are  presented.  Numerical  re¬ 
sults  for  the  modified  conjugate  gradient  method,  the 
Davidon  method,  and  the  standard  "best  step"  steepest  descent 
method  are  presented  and  compared  in  Example  5.1.  Since  the 
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conjugate  gradient  method  proved  superior  in  terms  of  CPU 
time  (hence  less  computer  costs)  it  was  utilized  on  Examples 
5.2  and  5.3.  In  Examples  5.2  the  constrained  distributed 
control  problem  is  considered.  Ekample  5.3  presents  results 
for  the  boundary  control  problem.  The  three-dimensional 
figures  presented  were  generated  by  the  Cal-Comp  Digital 
Incremental  Plotter  with  a  subroutine  developed  by  the  Iowa 
State  University  Computation  Center. 


Example  5.1.:  The  unconstrained  distributed  control  of  the 
vibrating  string 

The  unconstrained,  fixed  time,  penalized,  minimum  energy 
distributed  control  of  the  vibrating  string  is  considered. 

The  problem  may  be  stated  as  follows: 
minimize : 


J  [u^]=a 
subject  to: 


R 


F  x2 (r ,Tf )dr  +  B 


0  JQ 


f  ud2 (r , t) drdt ,  (5.1) 


Sx (r , t) =ud (r , t) , 


x(r ,0)=xQ (r) , 
xfc (r,0)=0, 
x (0 , t) =0  , 
x (1 , t) =0  , 


x(r,t) 


(5.2 


where  S  = 


at' 


3r 


2  ,  xQ  (r)  =  sinirr ,  a=2 ,  6=0.5,  Rp=l,  and 


Tp=4 , 


The  initial  and  the  boundary  conditions  are  illustrated 
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in  Figure  5.1. 

A  physical  interpretation  of  the  cost  functional  can  be 

obtained  if  the  inner  product  for  the  Hilbert  space 
2 

L  [[0/l]x[0,4]]  is  introduced.  Let  the  inner  product  be 
given  by 


<u,v> 


rf 

u (r , t) v (r , t) drdt  . 

0 


(5.3) 


The  norm  is  then 
I  I u |  |  =  /<u,u>  =  ( 


0  1 


R. 


u  (r ,  t.)  u  (r ,  t)  drdt ) 


(5.4) 


With  this  notation  the  problem  may  be  restated  as  follows: 

o 

determine  the  distributed  control  ud(r,t)eL  [ [0 , 1] x [0 , 4 ] ]  which 
minimizes  the  sum  of  the  magnitudes  of  two  vectors  ,  (1) 
the  magnitude  of  the  deviation  of  the  string  from  the  equil¬ 
ibrium  position  at  the  final  time,  and  (2)  the  magnitude  of 
the  control  effort. 

For  the  purposes  of  illustrating  the  general  theory 
developed  in  Chapter  II,  this  problem  will  be  recast  in  the 
form  given  by  Equation  2.20. 

It  can  be  shown  (see  Appendix  A)  that  the  Green's 
function  for  this  problem  is  given  by 

00 

o 

Gp(r,t,6,T)  =  E  sinkTT(t-x)  sinkm^  sinkirr.  (5.5) 

k=l 


Thus  the  formal  solution  at  the  final  time  is 
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x  (r,Tf ) 


R„ 


GF(r,Tf ;C,tQ)x0 (£)d£ 


t  ft  i 


0 


0 


GF(r,Tf  ,e#t)ud  (?,x)dl-dT(  (5.6) 


which  according  to  the  notation  developed  in  Chapter  II 
becomes 


x(r,Tf)  =  $  (Tf )  Xq  +  S  ^Tf)ud, 


(5.7)  tf 


and  as  a  result  the  cost  index  is  then 
JtuJ 


T2-!  l<f(Tf)x0  +  S_  ^Tf )  |  I2  +  3  I  |  ud  I  I2 


(5.8) 


By  expanding  Equation  5.8  in  terms  of  the  inner  product,  and 
by  employing  the  definition  of  the  adjoint  operator 
(<Sx ,y>  =  <x,S*y>)  Equation  5.1  becomes 


J [ud5=J0+<c,ud>  +  2<ud'Aud>' 


where 


and 


Jo  ■  57  I l*(T£>xoi I2' 


C  =  ~  [S  ^Tf )  1  %  (Tf  )xQ  , 

A  =  2B  +  |j[S  "(Tf )  )  *  [S  Crf)  ] 


(5.9) 


(5.10) 


(5.11) 


(5.12) 


Substitution  of  3= -  5 ,  a=2,  and  Tf=4  into  Equation  5.12 


yields 
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in  Figure  5.1. 

A  physical  interpretation  of  the  cost  functional  can  be 

obtained  if  the  inner  product  for  the  Hilbert  space 
2 

L  [ [0 ,1] x [0 , 4 ] ]  is  introduced.  Let  the  inner  product  be 
given  by 


<u ,  v> 


T  t  R 

F  u (r , t) v (r , t) drdt  . 

0  J  0 


(5.3) 


The  norm  is  then 

I  I U I  1  =  /<U,U>  = 


1 

u (r , t) u (r , t)drdt) 2  . 


(5.4) 


With  this  notation  the  problem  may  be  restated  as  fellows: 

2 

determine  the  distributed  control  ud(r,t)eL  [  [0 , 1] x [0 , 4 ) ]  which 
minimizes  the  sum  of  the  magnitudes  of  two  vectors,  (1) 
the  magnitude  of  the  deviation  of  the  string  from  the  equil¬ 
ibrium  position  at  the  final  time,  and  (2)  the  magnitude  of 
the  control  effort. 

For  the  purposes  of  illustrating  the  general  theory 
developed  in  Chapter  II,  this  problem  will  be  recast  in  the 
form  given  by  Equation  2.20. 

It  can  be  shown  (see  Appendix  A)  that  the  Green's 
function  for  this  problem  is  given  by 


Gp (r , t, £  ,x ) 


2 

I  y—  sinkir(t-t)  sinkir^  sinkirr.  (5.5) 
k=l  K7T 


Thus  the  formal  solution  at  the  final  time  is 
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yields 
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A  =  1  +  [s“^Tf ) ] ^ ) J ,  (5.13) 

and  thus  by  Lemma  A- 5/  m^=l.  The  gradient  of  J  is  given  by 

g ( u )  =  c  +  Au 

=  [S~^Tf )  ]%  (Tf  )xQ  +  [1+[S  ^Tf)  ]  *  [S_;lTf)  ]  ]ud 

—  1  *  “1 

=  ud+[S  (Tf ) ]  [*(Tf)x0+S  (Tf)ud]  (5.14) 

=  ud+ts”^Tf)]*[x(r/TfJ  . 

=  ud+(<3>(Tf)]  [4x(r,Tf)J. 

Equation  5.6  could  be  used  to  compute  the  cost  index  and 
Equation  5.14  could  be  employed  to  compute  the  gradient; 
however,  a  brief  numerical  study  of  the  convergence  of  the 
series  in  Equation  5.5  indicated  that  a  finite-difference 
method  is  much  more  efficient. 

A  summary  of  the  defining  equations  and  their  discrete 
approximations  is  given  in  Table  5,1. 

The  results  of  the  solution  of  this  problem  by  the  con¬ 
jugate  gradient  method  (modified) ,  the  steepest  descent  method, 
and  the  Davidon  method  are  presented  in  Table  5.2.  These  re¬ 
sults  indicate  that  for  this  problem  the  convergence  of  the 
Davidon  method  is  superior  to  the  other  two  methods.  In 
addition  it  is  apparent  from  Table  5.2  that  both  of  the 
second  generation  methods  offer  a  substantial  improvement 
over  the  standard  "best  step"  steepest  descent  method.  This 
can  be  seen  by  comparing  the  approximate  gradient  magnitudes 


Table  5.1  (Continued) 
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(columns  2,  4,  and  6  of  Table  5.2)  at  each  iteration.  However, 
the  Davidon  method  required  in  excess  of  150%  more  CPU  time 
than  the  modified  conjugate  gradient  method.  In  addition  the 
Davidon  method  required  200%  more  array  storage  than  did  the 
modified  conjugate  gradient  method.  Thus,  at  least  for  this 
problem  the  modified  conjugate  gradient  method  appears  to  be 
the  most  efficient  of  these  three  methods  with  respect  to 
computer  run-time  and  storage  requirements.  In  large  practical 
problems  the  run-time  and  storage  benefits  of  the  modified 
conjugate  gradient  method  would  become  an  even  greater 
advantage  of  the  method.  Since  each  inner  product  calculation 
is  essentially  a  double  numerical  quadrature,  the  excessive 
CPU  times  of  the  Davidon [q]  method  can  probably  be  attributed 
to  the  large  number  of  inner  products  required  by  the 
algorithm.  However,  it  might  be  possible,  it  extreme  care 
is  taken  in  programming,  to  make  the  Davidon [q]  method 
competitive  (with  respect  to  storage  and  CPU  time)  with  the 
modified  conjugate  gradient  method. 

The  results  presented  in  columns  1  and  5  of  Table  5,2 
indicate  that  the  discrete  approximation  of  the  cost  function¬ 
al  J[«]  given  by  ^  [ . ]  does  not  decrease  monotonically ,  as  the 
conventional  optimization  theory  predicts,  but  rather 
increases  after  the  third  or  fourth  iteration.  This  apparent 
contradiction  is  explained  by  the  approximation  theory 
developed  in  Chapter  IV,  which  showed  that  the  numerical 
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sequence  (un)  generated  by  the  approximate  gradient  algorithms 
minimizes  J [ • ]  not ^  [ .  ]  ,  and  certainly  not  J[«].  Thus,  it 
is  entirely  possible,  within  the  context  of  the  approximation 
theory,  for  {y[unl}  not  to  be  monotonically  decreasing.  The 
fact  that  {u  }  minimizes  J[»]  is  evident  from  the  decreasing 
magnitude  of  the  approximate  gradient  II 9n II  (columns  2  and 
6  of  Table  5.2).  This  brief  discussion  illustrates  the  im¬ 
portance  of  understanding  the  effects  of  gradient  errors  on 
gradient  methods. 

The  results  of  the  error  analysis  are  also  presented  in 
Table  5.2.  These  results  indicate  that  either  the  error 
bounds  are  conservative  or  else  there  are  considerable  errors 
introduced  by  the  various  approximations  involved  in  the 
numerical  solution.  From  the  results  given  in  Table  5.2  it 
is  observed  that  the  optimal  control  error  is  of  the  same 
order  of  magnitude  as  the  norm  of  the  approximate  optimal 
control.  In  this  case  it  is  felt  that  this  does  not  indi¬ 
cate  a  conservative  error  bound,  but  rather  that  there  is 
considerable  error  in  the  approximate  optimal  control.  This 
conclusion  is  based  on  the  observation  that  after  a  refine¬ 
ment  of  the  relatively  coarse  mesh,  used  in  the  finite- 
difference  solution  of  the  state  and  costate  systems,  the 
approximated  gradient  magnitude  increased  sharply.  This 
indicates  substantial  gradient  errors,  in  which  case  large 
optimal  control  errors  are  expected.  It  also  indicates  that 
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for  this  problem  piecewise  quadratic  functions  may  not  be 
the  best  selection  for  the  interpolating  functions.  Piece- 
wise  linear  approximating  functions  were  tried  but  as  ex¬ 
pected  gave  even  larger  estimated  control  errors.  Due  to  the 
nature  of  this  particular  problem, trigonometric  approximating 
functions  would  be  the  obvious  logical  choice.  A  discussion 
of  this  consideration  will  be  deferred  until  the  other  examples 
are  considered. 

The  cost  functional  error  estimate  is  obviously  con¬ 
servative.  This  of  course  can  be  explained  by  the  methods 
used  in  deriving  this  estimate  (i.e.,  the  triangle  in¬ 
equality,  Schwartz's  inequality,  etc.). 

The  initial  guessed  distributed  control,  the  initial 
s  t 

trajectory  (1  component) ,  the  numerically  converged  approxi¬ 
mate  optimal  control,  and  the  corresponding  optimal  trajectory 
are  depicted  in  Figure  5.2,  (a),  (b) ,  (c) ,  and  (d) ,  respective¬ 
ly. 


Example  5.2.:  The  constrained  distributed  control  of  the 
vibrating  string 

The  constrained,  fixed  time,  fixed  terminal  state 
(partial) ,  minimum  energy  distributed  control  of  the  vibrating 
string  is  considered.  The  problem  may  be  stated  as  follows: 


minimize 

Jfudl  =  B 


Tff^F  2 

ul(r,t)drdt,  / 

0  a 


(5.15) 


(a)  THE  GUESSED  INITIAL  CONTROL 


(b)  THE  MOTION  OF  THE  STRING 
DUE  TO  THE  INITIAL  CONTROL  «v 


(«!}  THE  OPTIMAL  TRAJECTORY 

CORRESPONDING  TO  w*  (c)  THE  APPROXIMATE  OPTIMAL  CONTROL  u#(r,0 


Figure  5.2.  The  solution  to  the  unconstrained  minimum  energy 
distributed  control  of  the  vibratinq  string 
(R-  =  1.0  and  T,  =  1.0) 
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subject  to 

Sx(r,t)=ud(r,t)  , 
x  (r, 0)=Xq  (r)  , 
x{r,Tf )=0 , 
xfc  (r  ,0 ) =0  , 

X  (0  ,  t)  =0  , 
x (1 , t) =0  / 

,2 


x(r,t) 


(5. 16) 


where  S  = 


— »•  ,  xn  (r)  =sin7rr ,  ct=2,  6=0.5,  Rv=l,  and  Tf=4. 
9t“  3r  u 

The  initial,  final,  and  boundary  conditions  are  illustrated 
in  Figure  5.3.  The  primary  difference  between  this  example 
and  the  previous  one  is  that  in  this  case  the  terminal  con¬ 
dition  x(r,Tf)=0  is  included.  Since  this  terminal  constraint 
coupled  with  the  dynamical  system  constitutes  a  constraint 
in  the  control  space  U,  this  problem  is  not  directly  solvable 
by  the  gradient  methods.  Thus  the  penalty  function  method  is 
employed  to  alter  the  form  of  the  problem  by  replacing  the 
constrained  problem  by  an  approximate  unconstrained  problem. 
The  introduction  of  the  penalty  function  to  account  for  the 
terminal  constraint  yields  a  new  cost  functional 


fR 

JPtud]=a»  I 


n 


J  0 


F  x2 (r,Tf )dr  +  j [ud] , 


(5.17) 


where  the  penalty  constant  an  is  arbitrarily  chosen.  The 
defining  equations  and  their  discrete  approximations  are  then 
exactly  the  same  as  in  Example  5.1,  and  are  given  in  Table 
5.1.  The  Sequential  Unconstrained  Minimization  Procedure  is 
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to  solve  a  sequence  of  unconstrained  problems  which  converge 
to  the  solution  of  the  constrained  problem. 

Results  of  the  solutions  by  the  modified  conjugate 
gradient  method  with  increasing  penalty  constants  an  are 
presented  in  Table  5.3.  The  initial  guessed  control,  the 
initial  trajectory,  the  numerically  converged  approximate 
optimal  control,  and  the  corresponding  optimal  trajectory 
for  an=l00  are  depicted  in  Figure  5.4,  (a),  (b) ,  (c) ,  and  (d) . 
The  results  of  the  iteration  resulting  in  Figure  5.4  are 
given  in  Table  5.4.  From  the  results  presented  in  Tables 
5.3  and  5.4,  and  from  the  solution  illustrated  in  Figure  5.4, 
it  appears  that  the  penalty  function  method  offers  a  practical 
means  for  solving  constrained  problems  of  this  type. 

Table  5.3.  Penalty  constants  for  the  solution  of  the  con¬ 
strained  vibrating  string  problem 


Penalty  Constant 
w 

J=J  -P[x] 

P 

Constraint 

Error 

2 

0.24954254x10° 

0. 14859240x10° 

5 

0. 55010654x10° 

0. 52410559X10"1 

50 

0. 10938682X101 

0. 10421661xl0~2 

100 

0.11454726X101 

0. 27283289X10-3 

500 

0.11894317X101 

0.11353210xl0~4 

1000 

0. 11957950X101 

0. 28465427xl0“5 

w 

(e)  THE  APPROXIMATE  OFTIMAL  CONTROl 


(4)  THE  OPTIMALTRAJECTOIY 


Figure  5.4. 


The  constrained  minimum  energy  distributed  control 
of  the  vibrating  string,  rx„  =  10D 
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Table  5.4.  The  solution  of  the  constrained  vibrating  string 
problem 


Iteration 

number 

Modified  Conjugate 

Gradient  Method 

^n'V 

0 

0. 35919006xl02 

0. 24209658X104 

1 

0. 76073334X101 

0.12886314xl02 

2 

0.11789181X101 

0. 52546089x10° 

3 

0 . 1172732  8x10 1 

0. 225.17444xl0"4 

4 

0 . 11727  39  8X101 

0. 21432484xl0~3 

5 

0.11727564X101 

0. 45933457xl0-6 

6 

0.11727565X101 

0. 55115803X10-8 

7 

0.11727560X101 

0. 16968429xl0~7 

8 

0.11727560X101 

0.17853686X10'11 

9 

0.11727559X101 

0. 44314448X10-12 

10 

0.11727559X1O1 

0. 30146695xl0_12 

11 

0.11727559X101 

0.14755488X10"15 

CPU  Time=52 . 6 

sec  (with  plot) 

Storage  =  140K 
uQ (r ,  t)  = 

Total  (ARRAY  +  OBJECT  CODE) 

lOe  sinirr  cosirt. 

While  performing  the  numerical  study  of  the  effects  of 
the  penalty  constant  on  the  solution,  it  was  discovered  that 
by  guessing  the  initial  control  to  be  identically  zero  (i.e., 
UQ<r,t)=0)  the  conjugate  gradient  method  converged  numerically 
in  exactly  one  iteration.  The  results  of  the  iteration  for 
the  case  where  an=5 ,  x  {r ,  0)  =sinirr ,  and  Ug(r,t)  =  0  are  given 
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in  Table  5.5.  Further  numerical  investigafcic n  with  different 
initial  conditions  and  initial  guessed  controls  indicate  that 
the  rapid  convergence  was  due  to  the  particular  combination 
of  the  initial  conditions  and  the  initial  guessed  control 
(i.e.,  x(r,0)=simrr  and  UQ(r,t)=0).  Numerical  results  for  the 
case  where  <*n=5,  x(r  ,0 )  =r  (1-r)  ,  and  Ug(r,t)  =  0  are  given  in 
Table  5.6.  It  is  evident  from  these  results  that  when  the 
initial  condition  is  polynomial/  then  numerical  convergence 
from  the  initial  guess  uQ(r,t)=0  is  not  obtained  in  one 
iteration . 

The  theoretical  implications  of  these  results  are 
interesting.  It  appears  that  when  the  initial  conditions 
are  trigonometric  (e.g.  x  (r , 0)  =sinirr)  then  the  solution  of 
the  optimization  problem  is  in  a  finite-dimensional  subspace 
of  the  control  Hilbert  space  U.  Therefore,  the  infinite 
dimensional  problem  is  reduced  in  this  special  case  to  a 
finite-dimensional  problem.  For  example  the  solution 
might  appear  as  a  finite  double  Fourier  series  given  by 
N  M 

u*(r,t)  =  E  Z(a  „  cosnr  cosnt  +  b  _  cosnr  sinnt  (5.18) 
nm  nm 

+  c  „  sinnr  cosnt  +  d  „  sinnr  sinnt] . 
nm  nm 

The  parameter  optimization  problem  would  then  be  to  determine 
the  Fourier  coefficients  a  ,  b  .  c  .  and  d  .  It  appears 
that  in  this  special  case  the  minimizing  element  of  U  is 
contained  in  the  one-dimensional  subspace  spanned  by  the 


101 


approximate  gradient  of  J  at  Uq=0.  In  addition  the  initial 
guess  Uq  does  not  translate  the  direction  of  search  out 
of  this  subspace.  Thus  only  a  single  one-dimensional  minimi¬ 
zation  is  required  to  obtain  the  approximate  numerical  solu¬ 
tion.  Further  comments  on  how  this  observation  could  possibly 
be  used  to  generate  an  analytical  theory  for  a  special  class 
of  problems  will  be  discussed  in  the  next  chapter. 

Table  5.5.  The  solution  of  Example  5.2  with  a  trigometric 


initial  condition  (i.e. ,  XQ(r)=simrr) 


Iteration 

number 

Modified  Conjugate  Gradient  Method 

0  [u  ]  <g  ,q  > 

Zrp  1  n  ^n  ^n 

0 

0.25092812X101  0. 10533506xl02 

1 

0.81215934x10°  0 . 17635311xl0_28 

Initial  Conditions: 

x(r,0)=simrr,  xfc(r,0)=0 

Initial  guessed  control:  Ug(r,t)=0 

an=5,  6=.5 
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Table  5.6.  The  solution  of  Example  5.2  with  a  non- 

trigonometric  initial  condition  (i.e.,  xn(r)= 
r(l-r)) 


Iteration 

number 

Modified  Conjugate  Gradient  Method 

0.  (u  ]  <g  ,g  > 

t*pl  n  ^n'^n 

0 

0. 47825656xlO-1 

0.18528340x10° 

1 

0. 18137734X10-1 

0. 23826587X10"2 

2 

0. 18025130X10"1 

0. 11387571xl0“5 

3 

0. 18025621xl0_1 

0. 53594771xl0_6 

4 

0. 18025653xl0_1 

0. 35357040xl0“9 

5 

0.18025653xl0“1 

0. 34480603xl0"13 

6 

0 . 18025714xl0-1 

0. 74813105xl0"14 

Initial  Conditions 

:  x(r,0)=r (1-r) ,  xt 

(r ,  0)  =0 

Initial  Guessed  Control:  UQ(r,t)=0, 

an~5 i  B-.5 

Example  5.3.:  The 

Boundary  control  of 

the  vibrating  string 

The  fixed  time,  penalized,  minimum  energy  boundary  control 
of  the  vibrating  string  is  considered.  The  problem  may  be 
stated  as  follows: 


minimize 


RF 


x2 (r ,T, 


)dr  +  6 


(t) dt , 


J [ub]=a 


0 


(5.19) 


10  3 


subject  to 

Sx(r ,t)=0, 

x(r,0)=xQ (r) , 
xfc (r , 0 ) =Vq (r) , 
Tx  (0  ,  t)=Uj3  (t)  , 
x (1 , t) =0 , 


-x  (r ,  t) 


(5.20) 


ub<t) 


Figure  5.5.  Boundary  contrbl 
of  the  vibrating 
string 


2  2 

h*  -v A 

where  S  =  — *■  -  — =■  ,  T=I,  x*(r)=simrr,  a=8=l ,  RP=1,  and 
3t  drz  J 

T^“4.  The  initial  and  boundary  conditions  are  illustrated 
in  Figure  5.5.  The  defining  equations  and  their  discrete 
approximations  are  given  in  Table  5.7. 

The  results  of  the  solution  for  the  minimum  effort, 
boundary  control  of  the  vibrating  string  are  given  in  Tables 
5.8  and  5.9,  and  are  illustrated  in  Figures  5.6  and  5.7. 

Table  5. 8  contains  the  results  of  the  iteration  when  the 
initial  guessed  control  is  identically  zero  (i.e.  ,  uQ(t)=0). 
The  initial  guessed  boundary  control,  the  initial  trajectory, 
the  numerically  converged  approximate  optimal  control,  and 
the  corresponding  approximate  optimal  trajectory  are  shown 
in  Figure  5.6,  (a),  (b) ,  (c) ,  and  (d) ,  respectively.  As  in 

Example  5.2,  when  the  initial  boundary  control  was  guessed 


identically  zero,  the  iteration  converged  in  one  iteration.  ! 
The  explanation  for  the  rapid  convergence  is  essentially  the  j 
same  as  that  given  in  Example  5.2. 

I 


Table  5.7.  Summary  of  equations  for  Example 


Integration  is  performed  backwards  in  time. 


Table  5.7  (Continued) 
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Table  5.8.  Results  for  Example  5.3  with  u^(t)=0 


Iteration 

number 

Modified  Conjugate 
2[“n] 

Gradient  Method 

fg  ,q  > 

Jn  n 

n 

0 

0.50185624x10° 

0.  88209975x10^ 

la 

0.10027777x10° 

0. 31521281xl0~28 

ERROR  ANALYSIS: 

Optimal  Control 

ERROR:  |  '  •  ■  J*  |  j  £ 

0.17718170x10° 

Cost  Functional 

ERROR:  | J lu 

|  <  0.20936213x10°, 

where  | |e  (u* ,h) 

| |  -  0. 35436341x10° 

f 

|  | g  (u* ;qh) 

||2  =  0.  51743680x10' 

-1 

and 

1 |u*| j  =  0 

. 28581078xl0u. 

CPU  time  =  10.58 

sec.,  Storage  =  32400  BYTES. 

Convergence  in  one  iteration  occurred  only  when 
uQ(t)=0  was  used  as  the  initial  control  quess. 

The  results  of  the  iteration  for  a  different  initial 
guess  of  the  control  (i.e.,  u^(t}=-10e  fccos  2nt)  are  pre¬ 
sented  in  Table  5.9.  The  initial  guessed  control,  the 
approximate  optimal  control,  and  the  corresponding  approxi¬ 
mate  optimal  trajectory  are  illustrated  in  Figure  5.7,  (a) , 

(b) ,  and  (c)  respectively.  The  results  contained  in  Table 
5.8  and  Fiqure  5.7  indicate  that  the  modified  conjugate 
gradient  method  will  converge  (as  expected)  from  a  relatively 
poor  initial  guess.  The  converged  solution  from  each  of  the 
two  different  initial  guesses  arc  the  same,  as  demonstrated 
in  Tables  5.8  and  5.9  and  in  Figures  5.6  and  5.7. 


I 
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Table  5.9.  Results  of  Example  5.3  with  u0(t)=-10e  ^  cos  2irt 


Iteration 

number 

The  Modified 

1  Conjugate  Gradient  Method 

<(VV  ^n+l'V 

0 

0.26607469X102 

0. 22227374x10° 

1 

0. 13385153xl02 

0. 10305737xl03  0 . 1776 356 8xl0_1 3 

2 

0. 83627214X101 

0. 14073426xl03  ~0 . 5 728750 8xl0_1 3 

3 

0. 24645929X101 

0. 38506615xl02  0. 82156503xl0“14 

4 

0.45224374x10° 

0.91346661X101  0. 38941072xl0-13 

5 

0.21074810x10° 

0. 20603200X101  -0 . 194 2 890 2xl0_1 5 

6 

0.14740533x10° 

0.33660356x10°  0 . 21510571xl0-15 

7 

0.14688323x10° 

0.18495487x10°  -0 . 1301 0426xl0"16 

8 

0.10460883x10° 

0 . 34394526x10°  0 . 18561541xl0_15 

9 

0. 96151817xl0-1 

0. 612 446 88x10“ 1-0. 49699827xl0”15 

10 

0.97951980xl0_1 

0. 1998572 3xl0-3  0 . 10 31 0762xl0-15 

11 

0. 10793993x10° 

0. 52750740xl0_1  0. 34640259xlo”16 

12 

0.10831308x10° 

0. 38984046xl0"1  0 . 22421300xl0_15 

13 

0.10187977x10° 

0. 43182735X10-1  0 . 1 151 1516xl0_1 5 

14 

0. 9760839 4xl0_1 

0. 6 36 5626 3X10--1 -0.1 400 789  2x10” 15 

15 

0.97377092xl0-1 

0. 6617224 8xl0_1-0. 8 118 505 8x10” 15 

16 

0.10327465x10° 

0. 58561917xl0_2  0 . 13216424xlo”15 

17 

0.10309298x10° 

0.  >17066195x10“2-0. 37100040>:10“17 

18 

0.10064508x10° 

0. 25846238xl0”2  0 .  38916081xl0”16 

19 

0.10031671x10° 

0 . 962  3344 9x10” 3  0 . 1 720 832 lxlO” 16 

20 

0. 10065392x10° 

0. 37240919xl0~3-0. 6 6204095x10” 16 

21 

0.10036221x10° 

0. 12510802x10_4-0. 5 1698653x10” 17 

CPU  Time=34.5  sec  (with  three  dimensional  plot). 

Storage  =  126  K  (total). 

UQ(t)=-10e  cos  (2irt)  . 

Figure  5.7.  The  solution  of  the  minimum  energy  boundary 
control  of  the  vibratina  strinq 


110 


The  error  analysis  presented  in  Table  5.8  indicates 
that  the  optimal  control  error  and  the  cost  functional  error 
are  smaller  than  in  Example  5.1.  This  is  probably  due  to 
the  fact  that  there  is  less  discretization  in  boundary  control 
problems  than  in  distributed  control  problems.  As  indicated 
in  Tables  5.8  and  5.9  the  storage,  and  the  CPU  times  are 
also  reduced  in  this  boundary  control  problem.  The  reduced 
storage  requirements  are  due  to  the  fact  that  in  boundary 
control  problems  the  control,  the  gradient,  and  the  direction 
of  search  are  all  singly  subscripted  arrays.  The  reduced 
CPU  times  are  due  to  the  fact  that  in  boundary  control 
problems  there  are  no  double  integrals  to  be  approximated. 

The  accuracy  of  the  inner  loop  iterator  is  indicated 
in  column  4  of  Table  5.9.  The  approximate  directional 
derivative  at  the  one-dimensional  minimum  in  the  direction 
§n  is  given  by  <9n+i'an>'  and  theoretically  should  be  zero. 
However  the  numerical  accuracy,  which  is  in  the  order  of 

-14 

10  ,  is  completely  satisfactory.  When  the  other  inner 

loop  iterators  were  utilized  <g  .j_,sn>  was  never  smaller 
than  10  3,  which  demonstrates  the  validity  of  the  conclusions 


contained  in  Chapter  IV. 


CHAPTER  VI.  CONCLUDING  REMARKS 


The  original  objective  of  this  investigation  was  to 
develop  practical  means  of  optimizing  distributed  parameter 
systems.  The  gradient  methods  appeared  to  be  a  promising 
class  of  methods  for  solving  distributed  parameter  optimal 
control  problems.  However,  early  numerical  results  were 
disappointing.  Storage  requirements  and  computation  times 
for  test  problems  indicated  that  the  study  of  complex  dis¬ 
tributed  parameter  systems  would  probably  be  beyond  the  stor¬ 
age  capabilities  of  the  present  computer  system,  and  certain¬ 
ly  beyond  a  reasonable  computer  usage  budget.  Consequently, 
as  is  the  case  in  many  investigations ,  the  dissertation  ob¬ 
jectives  were  modified  as  work  progressed.  It  soon  became 
apparent  that  in  solving  continuous  distributed  parameter 
optimization  problems  on  the  digital  computer  the  numerous 
approximations  involved  in  transforming  the  continuous  problem 
into  a  discrete  problem  were  causing  considerable  errors.  In 
order  to  efficiently  obtain  an  approximate  solution,  it  be¬ 
came  evident  that  the  understanding  of  the  effects  of  these 
various  approximations  was  essential.  Therefore,  the  modified 
objectives  of  developing  an  approximation  theory  for  the 
numerical  optimization  of  distributed  parameter  systems  were 
considered.  These  objectives  have  been  achieved  for  a 
particular  class  of  problems  which  are  of  practical  signifi¬ 
cance.  Also,  even  though  the  original  objective  was  not 


achieved,  substantial  progress  was  made  towards  this  objective. 

An  introduction  to  the  optimization  of  distributed 
parameter  systems  was  presented  in  Chapter  I,  including  a  dis¬ 
cussion  of  both  the  theoretical  and  the  numerical  results 
which  exist  at  the  present  time.  Some  of  the  engineering 
applications  for  the  optimization  of  distributed  parameter 
systems  were  also  discussed. 

A  general  theory  for  linear  distributed  parameter  systems 
was  presented  in  Chapter  II.  The  necessary  conditions  for  a 
local  relative  minimum  were  developed  from  the  functional 
derivative  point  of  view.  It  is  felt  that  this  development 
is  in  the  spirit  of  the  subsequent  use  of  gradient  methods, 
since  it  indicates  exactly  how  the  gradient  of  the  cost  func¬ 
tional  is  to  be  calculated.  The  penalty  function  method  was 
introduced  to  render  the  constrained  problems  amenable  to 
gradient  methods.  A  brief  discussion  of  the  various  methods 
of  solving  optimization  problems  was  included. 

The  three  most  popular  gradient  methods  were  discussed  in 
Chapter  III.  A  comparison  between  the  conjugate  gradient 
method,  the  Davidon  method,  and  the  steepest  descent  method 
was  given.  Three  linear  minimization  methods  were  introduced, 
ar  discussed. 

In  Chapter  IV  the  analysis  of  the  effects  of  the  many 
approximations  on  the  solution  of  the  optimal  distributed 
parameter  control  problem  was  presented.  It  was  found  that  in 
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n 


the  numerical  optimization  of  distributed  parameter  systems 
by  gradient  methods  the  round-off  errors  are  negligible 
when  compared  with  other  approximations  involved.  Further¬ 
more,  it  became  apparent  that  the  approximations  of 
integral  operators  (e.g. ,  the  cost  functional,  inner  products, 
etc. )  could  be  made  several  orders  of  magnitude  more  accurate 
than  the  approximations  of  the  differential  operators.  Hence, 
the  two  primary  error  sources  were  found  to  be,  (1)  the 
approximation  of  functions,  and  (2)  the  approximation  of  dif¬ 
ferential  operators.  It  was  also  shown  how  these  errors  effect 
the  computation  of  the  gradient  vector,  which  obviously 
directly  influences  the  convergence  of  the  approximate  gradient  ■ 
methods.  In  addition,  it  was  demonstrated  that  the  approxi¬ 
mate  gradient  vector  is  the  exact  gradient  of  another  quad¬ 
ratic  functional,  defined  on  a  subspace  of  the  original  control 
space.  This  result,  when  coupled  with  the  analysis  of  the 
inner  loop  iteration,  leads  to  necessary  and  sufficient  con¬ 
ditions  for  the  numerical  convergence  of  the  approximate 
gradient  methods.  It  was  shown  that  when  method  3  (refer 
to  Theorem  4.2)  is  utilized  in  the  inner  loop  then  the  approxi¬ 
mate  gradient  methods  converge  to  the  minimum  of  J(*),  not 
J(’)  nor  its  discrete  analog  ^(*).  This  lead  to  the  concept 
of  optimal  control  error  and  of  the  suboptimality  of  the 
approximate  solution.  Results  1 sading  to  the  estimates  of  the 
optimal  control  error  and  the  suboptimality  of  the  approximate 
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solution  were  established.  A  geometrical  interpretation  of 
these  estimates  was  included.  It  was  recognized  that  these 
estimates  could  be  used  to  prove  the  convergence  of  the 
approximate  solution  to  the  exact  solution  as  discretization 
tends  to  zero.  Besides  indicating  the  accuracy  of  a  particular 
solution/  these  error  estimates  yield  a  discernment  into  what 
type  of  improvements  to  the  approximate  solution  are  feasible. 

Numerical  results  for  both  the  constrained  and  the  un¬ 
constrained  optimal  control  of  the  one-dimensional  wave 
equation  were  given  in  Chapter  V.  Both  distributed  and 
boundary  control  of  the  wave  equation  were  considered.  The 
standard  numerical  comparisons  between  the  modified  conjugate 
gradient  method,  the  Davidon  method,  and  the  steepest  descent 
method  were  reported.  It  is  evident  from  these  results  that 
the  second  generation  gradient  methods  offer  a  substantial 
improvement  over  the  steepest  descent  method,  especially  with 
regard  to  the  number  of  iterations  required  to  achieve  numeri¬ 
cal  convergence.  This  is  particularly  significant  in  the 
minimization  of  distributed  parameter  systems,  since  each 
iteration  is  very  costly  in  terms  of  computer  time.  Although 
the  Davidon  method  converged  faster  (based  on  the  number  of 
iterations) ,  the  modified  conjugate  gradient  method  required 
considerably  less  storage  and  computer  time  to  obtain  com¬ 
parable  results.  Thus,  for  the  class  of  distributed  parameter 
systems  considered  the  modified  conjugate  gradient  method 
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appears  to  be  the  most  efficient  method  of  solution. 

It  was  not  expected  that  the  Davidon  method  would  con¬ 
verge  more  rapidly  than  the  conjugate  gradient  method,  since 
the  theoretical  results  in  (32)  indicate  that  for  this  class 
of  problems  these  two  iterations  should  produce  the  exact 
same  results.  In  (46)  similar  differences  are  reported  for 
the  pure  and  the  modified  conjugate  gradient  methods  on  a 
quadratic  programming  problem.  The  reasons  given  in  (46) 
for  this  behavior  were  based  on  round-off  errors.  However, 
in  this  case  the  Davidon  method  is  by  far  the  more  complex  of 
the  two  methods,  and  hence  should  be  more  sensitive  to  round¬ 
off  errors.  It  is  doubtful  that  in  general  round-off  errors 
would  increase  the  rate  of  convergence.  Thus  the  answer  to 
this  problem  remains  an  open  question. 

Due  to  the  discretization  processes  involved  in  the 
solution  of  the  optimization  problem  on  a  digital  computer 
the  approximate  gradient  methods  do  not  converged  to  the 
exact  solution.  It  was  shown  that  this  is  due  to  gradient 
error.  Some  authors  have  argued  that  when  substantial  gradient 
errors  are  present  the  more  powerful  gradient  methods  should 
not  be  used,  since  the  effects  of  gradient  errors  might  be 
amplified  by  these  methods.  The  results  contained  in  this 
work  indicate  that  this  is  not  necessarily  the  case.  Cer¬ 
tainly,  it  is  true  that  in  the  presence  of  gradient  errors 
the  gradient  methods  will  not  yield  the  exact  solution. 
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However,  if  used  properly,  the  more  powerful  gradient  methods, 
such  as  the  conjugate  gradient  method  or  the  Davidon  method, 
find  the  approximate  solution  much  faster;  hence,  at  less 
cost  than  the  more  simple  gradient  methods. 

In  conjunction  with  the  theory  and  the  numerical  results 
presented  here,  there  remain  many  open  problems.  The  ex¬ 
tension  of  both  the  conjugate  gradient  method,  and  the 
Davidon  method  to  non-quadratic  distributed  parameter  systems 
is  certainly  feasible,  and  the  potential  results  of  such  ex¬ 
tensions  appear  to  be  very  promising.  An  extension  of  the 
error  analysis  presented  in  Chapter  IV  to  nonlinear  problems 
would  be  of  value.  Also,  consideration  of  a  more  conserva¬ 
tive  cost  functional  error  estimate  would  be  useful.  Appli¬ 
cations  of  numerical  methods  to  bounded  distributed  parameter 
control  problems  have  not  received  a  great  deal  of  attention  to 
date.  The  results  contained  in  Tables  5.5  and  5.7  indicate 
that  the  application  of  a  Ritz  method  to  the  class  of  problems 
with  trigonometric  initial  conditions  may  prove  fruitful.  In 
addition,  if  the  initial  conditions  are  not  trigonometric, 
then  they  could  be  approximated  by  an  appropriate  Fourier 
series,  which  could  be  conveniently  truncated  to  obtain  an 
approximate  solution.  For  problems  with  linear  dynamics  an 
investigation  of  the  penalty  constants  which  circularize  the 
constant  cost  contours  in  the  control  space  would  be 
beneficial,  since  this  would  increase  the  rate  of  convergence 


for  constrained  problems.  A  more  efficient  method  for  pro¬ 
gramming  Davidon's  method  with  emphasis  placed  on  reducing 
the  storage  requirements  of  the  method  would  be  a  contribu¬ 
tion.  More  sophisticated  interpolation  methods  would  most 
likely  reduce  the  optimal  control  error  and  the  cost  function¬ 
al  error.  The  utilization  of  cubic  splines  present  interest¬ 
ing  possibilities.  Finally,  it  is  suggested  that  the  error 
analysis  and  the  approximation  theory  developed  in  this  work 
be  applied  to  lumped  parameter  control  problems,  and  to 
parameter  optimization  problems  (where  the  gradient  vector 
is  obtained  from  a  finite-difference  formula). 

Although  much  has  already  been  accomplished,  the  interest 
and  the  activity  in  this  area  still  remains  high;  and  it  is 
felt  that  this  field  still  offers  numerous  possibilities 


for  research. 
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APPENDIX  A.  MATHEMATICAL  PRELIMINARIES 

In  the  development  of  an  approximation  theory  for  the 
numerical  optimization  of  distributed  parameter  systems, 
results  from  several  areas  of  applied  mathematics  are 
utilized.  The  most  important  of  these  areas  include  (1) 
numerical  analysis,  (2)  functional  analysis,  (3)  optimization 
theory,  (4)  partial  differential  equations,  and  (5)  approxi¬ 
mation  theory.  In  an  attempt  to  make  this  dissertation 
reasonably  self-contained,  a  limited  collection  of  definitions 
and  theorems  from  these  areas  will  be  presented.  By  necessity, 
the  treatment  will  be  brief  and  incomplete;  only  material 
which  is  used  in  this  dissertation  will  be  discussed.  In 
some  instantancies  standard  definitions  and  results  will  be 
altered  to  include  concepts  which  are  not  introduced  else¬ 
where  in  this  appendix.  It  will  be  assumed  throughout,  that 
the  reader  is  familiar  with  standard  mathematical  notation. 

Selected  Results  from  Functional  Analysis 

The  natural  setting  for  the  application  of  gradient 
methods  to  distributed  parameter  systems  is  a  real  separable 
Hilbert  space.  The  reasons  for  this  are  (1)  gradient  methods 
require  the  concept  of  direction  (hence,  an  inner  product) 
in  the  function  space  in  which  the  iteration  takes  place,  and 
(2)  separability  and  completeness  are  required  in  the  proof 
of  convergence.  Unfortunately,  the  definition  of  a  Hilbert 
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space  as  "a  complete  inner  product  space"  leaves  much  unsaid, 
and  consequently  some  additional  preliminary  concepts 
need  to  be  introduced. 

Definition  A-l:  A  linear  space  is  a  set  X  for  which  there 
are  defined  an  operation  of  addition  denoted  by  +  ,  so  that 
{X;  +  }  is  a  commutative  group;  and  an  operation  of  scalar 
multiplication  satisfying  the  distributive  law 
a  (x+y)=ax+8y , (a+8) x=ax+8x,  and  (aB)x=a(£x),  lx=x  for  all 
x,yeX  and  cc/Be^a  scalar  field. 

In  what  follows  the  terms  linear  space  and  vector  space 
will  be  used  interchangeably,  and  the  elements  of  a  linear 
(vector)  space  will  be  referred  to  as  vectors.  If  T  is  the 
field  of  real  numbers,  then  X  is  a  real  vector  space ;  if  ^ 
is  the  field  of  complex  numbers,  then  X  is  a  complex  vector 
space . 

Definition  A-2;  A  nonempty  subset  S  of  a  linear  space  X  is 
called  a  subspace  in  X  if  x+y  is  in  S  whenever  x  and  y  are 
both  in  S  and  if  also  ax  is  in  S  whenever  x  is  in  S  and  a 
is  any  scalar. 

Definition  A-3;  A  functional  is  a  mapping  of  a  linear  space 
X  into  the  scalars  ,  i.e.,  f:  X+R^. 

Definition  A-4:  An  inner  product  on  a  linear  space  X  is  a 


complex  valued  function  of  two  elements  selected  from  the 


L2' 


space  X,  denoted  by  <x,y>,  and  satisfying  the  conditions: 

(i)  <x,y>  is  linear  as  a  function  of  x  for  fixed  y. 

(ii)  <y,x>  =  <x,y>  (the  complex  conjugate). 

(iii)  <x,x>  >0  if  x^O. 

Definition  A-5;  A  norm  is  a  real  function,  | | . | | ,  defined 
on  a  linear  space  X  satisfying,  for  all  vectors  x,yeX, 
ae^  the  conditions: 

(i )  |  |  x  |  |  >0  x^O 

(ii)  |  |x+y |  |  £  |  j  x |  |  +  | ! y 1 |  (triangular  inequality) 

(iii)  ] | ax | |  =  |a|  ||x||  (homogeneity) . 

Definition  A-6 :  An  inner  product  space  is  a  linear  space 
together  with  an  inner  product  defined  on  it. 

Remark :  In  an  inner  product  space  the  function 

||*||  =  /T“.T  is  a  norm. 

Definition  A- 7:  A  Cauchy  sequence  in  an  inner  product  space 

is  a  sequence  {x^}  such  that  to  each  e>0,  there  corresponds 

a  number  N  such  that  I  I  x  -x  I  I  <  s  whenever  n>N  and  m>N. 

1  1  n  m 1  1 

Definition  A-8:  A  normed  linear  space  is  said  to  be  complete 
if  every  Cauchy  sequence  in  it  is  convergent  (a  normed  linear 
space  is  a  linear  space  together  with  a  norm) . 


The  definition  of  a  Hilbert  space  can  now  be  given. 
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Definition  A-9;  A  Hilbert  space  is  a  complete  inner  product 
space . 

Definition  A-10;  A  space  is  called  separable  if  it  contains 
a  countable  dense  subset. 

In  a  separable  Hilbert  space  there  exists  at  least  one 
linearly  independent  set  of  vectors  which  spans  the  space. 
Hence  every  vector  can  be  written  as  a  countable  linear 
combination  of  this  linearly  independent  set.  The  partial 
sums  of  this  countable  linear  combinations  forms  a  Cauchy 
sequence;  hence,  converge  to  a  unique  element  in  the  space. 

Another  concept  which  is  important  is  that  of  a  linear 
trans  formation . 

Definition  A-ll:  If  x  and  Y  are  linear  spaces,  a  mapping 
T:  X-*-Y  is  a  linear  transformation ,  if  for  all  scalars  a,  8, 
T(ay+6y)=  aTx+BTy,  for  all  x,yr.X. 

Definition  A-12:  A  linear  operator  is  a  linear  transformation 
of  X  into  X,  i.e.,  T:X-*X. 

Remark ;  The  operator  defined  by 
Tf  Rf 

Lu  -f  Gp(r,t;5,T)u(5,T}dP  , 

Jo  Jo  F 

is  linear  {due  to  the  properties  of  the  definite  integral) . 
Definition  A-13:  Let  X  and  Y  be  n.^rmed  spaces  and  let 
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AeB[X,Y).  The  adjoint  operator  A*:  Y*-*-X*  is  defined  by  the 
equation  <x,A*y*>=<Ax,y*> . 

Definition  A-14;  An  operator  A  defined  on  a  Hilbert  space 
is  said  to  be  self-adjoint  if  A=A* . 

Definition  A-15:  An  operator  A  defined  on  a  Hilbert  space 

2  2 

is  said  to  be  a  projection  if  A  =A  and  A*»A,  where  A  kAA. 

Definition  A-16;  A  vector  x  is  orthogonal  to  a  subset  M 
of  an  inner  product  space  X,  if  <x,y>=0  for  all  yeM.  This 
is  written  xxM.  The  set  of  all  such  vectors  is  called  the 
annihilator  of  M  and  is  written  as  Thus 

M*" —  {x :  xx M ,  xeX} 

Theorem  A-l :  (63)  (The  Projection  Theorem)  If  M  is  a  sub¬ 

space  of  X,  then  M+MX=X,  where  X  is  a  Hilbert  space. 

Definition  A-17:  It  S  be  a  subset  of  the  domain  of  the 
function  f,  where  xeX,  then  an  evaluation  map  Eg  is  defined 
by  (f  (x) ;  xeS}. 

Definition  A-18;  A  real  valued  functional  f  defined  on  a 
convex  subset  C  of  a  linear  space  is  said  to  be  convex  if 

f  (ax1+  (1-ct)  x2)  <_af  (x^)  +  (1-a)  f  (x2) 

for  all  x^,x2eC  and  all  a,  0<a<l. 


Definition  A- 3.9:  J ( •  ]  is  a  quadratic  functional,  defined  on 
a  real  Hilbert  space  X,  if  J[«]  has  the  general  form 

J  [x]*=Jg-t<C/X>  +  — <x,Ax>, 

where  A  is  a  linear  operator. 

Remark:  It  can  be  shown  that  a  quadratic  functional  is 

convex  (52). 

Theorem  A- 2 :  (22)  (The  existence  of  the  Optimal  Control) 

If 

2 

(i)  H  is  a  Hilbert  space  (e.g.,  L  ), 

(ii)  U  is  a  closed  convex  bounded  subset  of  H, 

(iii) J  is  a  real  continuous  convex  function  on  U, 
then  there  exists  a  u*eU  such  that 

J[u*]  =  inf  J[u] 
ueU 

From  Theorem  A-2  the  solution  to  the  problem  formulated 
in  Chapter  II  exists  and  is  unique. 

The  following  results  are  utilized  by  Chapter  IV  to 
determine  the  spectral  bounds  of  an  operator. 

Definition  A-20:  The  set  of  all  complex  numbers  X  such  that 
A-Ai  is  invertible  (has  an  inverse)  for  a  given  operator  A, 
is  called  the  resolvent  set  of  A  and  is  denoted  by  p (A) . 


Definition  A-21;  The  spectrum  of  an  operator  A  is  denoted 
by  a  (A),  where  a(A)={X:  X|fp  (A) } . 

Lemma  A-l :  (64)  Let  AeB[H],  where  B[H]  denotes  the  set 

of  bounded  operators  on  H,  a  normed  space,  then  A* A  is  self- 
adjoint. 

Lemma  A- 2 :  (64)  If  AeB(H],  then  I+A*A  is  self-adjoint. 

Lemma  A- 3;  (63)  If  A  is  self-adjoint,  then  o(A)CR^. 

Lemma  A- 4:  (65)  The  spectrum  of  a  skew  symmetric  operator 
is  pure  imaginary  (A  is  skew  symmetric  if  A=-A*). 

Lemma  A- 5;  Let  A  be  a  completely  continuous,  skew  symmetric 
operator  then 

l<a(I+A*A)  . 

Another  important  theorem  in  analysis  which  is  especially 
useful  in  proving  convergence  of  iteration  formulas  is  the 
Fixed  Point  Theorem.  The  basis  of  this  theorem  is  the  concept 
of  a  contraction  mapping. 

Definition  A-22:  Let  A  be  a  mapping  (not  necessarily  linear) 
of  a  Hilbert  space  H  (or  Banach  space  or  even  a  metric  space) 
into  itself.  Let  for  some  ot,  0<a<l, 

|  j  A  (x) -A (y)  |  |  <_  a|  |  x-y  |  |  , 

for  all  y,  yeH.  Then  A  is  said  to  be  a  contraction . 
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Theorem  A- 3:  (63)  (Fixed  Point  Theorem)  Every  contraction 

A,  defined  on  a  Hilbert  space  has  one  and  only  one  fixed  point, 
i.e.,  Ax— x  has  one  and  only  one  solution  for  xeH.  (This 
theorem  holds  also  for  metric  spaces). 

In  the  optimal  control  of  distributed  parameter  systems 
the  extension  of  the  concept  of  a  derivate  in  a  Hilbert 
space  setting  is  useful. 

Definition  A-23:  (31)  Let  xeDCX  and  let  h  be  arbitrary 

in  X.  If  the  limit 


dJ  ,  J  [x+ah]-J  [x] 

-  Um  - 5 -  , 

a-0 

exist,  it  is  called  the  Gateaux  differential  of  J  at  x  with 
increment  h.  If  this  limit  exist  for  each  hex,  the  functional 
J  is  said  to  be  Gateaux  differentiable  at  x. 


Definition  A-24:  (31)  Let  J  be  a  transformation  defined  on  an 

open  domain  D  in  a  normed  space  X  and  having  range  in  a 
normed  space  Y.  If  for  fixed  xeD  and  hex  there  exists 
6J(x;h)eY  which  is  linear  and  continuous  with  respect  to  h 
such  that 


lim  1  | J (x+h) -J (x) -6J  (x ;h)  {_  =  ^ 

IN  |-o  INI 

then  J  is  said  to  be  Frechet  differentiable  at  x  and 
6j(x?h)  is  said  to  be  the  Frechet  differential  of  J  at  x 


with  increment  h 
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Pertinent  Results  from  Partial 
Differential  Equations 

The  concept  of  well-posedness  plays  an  important  role  in 
the  treatment  of  partial  differential  equations.  It  is 
analogous  to  the  concepts  of  existence  and  uniqueness  of 
solutions  in  ordinary  differential  equations.  In  this 
dissertation  an  initial- value  problem  will  be  called  well- 
posed  if 

(i)  a  solution  exist, 

(ii)  the  solution  is  unique, 

(iii)  the  solution  depends  continuously  on  the 
initial  data. 

Reference  (60)  contains  a  detailed  discussion  of  well- 
posedness  of  the  abstract  Cauchy  initial  value  problem. 

Another  important  result  used  in  Chapter  II  is  the 
representation  of  the  solution  of  a  partial  differential 
equation  in  terms  of  linear  operators 

x  (r,  t)  =<P  (t)  xQ  +  S  K)ud  +  T  (t)ufa  .  (A-l) 

Balakrishnan  (20)  has  given  conditions  under  which 
Equation  A-l  represents  the  solution  to  the  nonhomogenous 
partial  differential  equation.  In  this  work  only  problems 
for  which  the  Green’s  function  yields  the  representation  given 
in  Equation  A-l  are  considered. 

For  example,  consider  the  one-dimensional,  non-homogenous 
wave  equation  defined  by 
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xtt  =  xrr  +  ud' 

(A-2 ) 

x (r , 0) =0 , 

(A-3) 

xfc (r,0)=0. 

(A-4) 

x  (0, t)=0. 

(A-  5 ) 

x (1 , t) =0 , 

(A-6 ) 

This  second  order  system  can  be  rewritten  as  an  equivalent 


first  order  system  by  considering  the  transformation  defined 

3x 
3r ' 


by  v  =  and  w  =  4^;  hence,  the  above  system  becomes 


3w 

0 

s’ 

w 

- 

0 

3 1 

= 

3r 

+ 

3v 

3 

,ud. 

,3t 

.3r 

0 

V 

•  • 

In  order  to  obtain  the  Green's  function  for  this  problem, 
the  following  eigenvalue  problem  is  considered 


■«* 

*k(r) 

*k(r) 

ii 

>■ 

*• 

1$  o 

♦k<r). 

2  2 

where  <t>k(0)  =  <J>k(l)=0.  The  solution  of  this  eigenvalue 
problem  yields  +  kiri  as  the  eigenvalues,  and 


X  H* 

-i  cos(kTTr) 

sin  (kirr) 

(A-9 ) 


as  the  eigenfunctions.  By  using  these  eigenfunctions,  the 
Green's  function  for  this  problem  is  given  by 


Gp(r,t;£/T)  = 


E  coskr  (t-t) coskrrr  coskff^i  E  sink*  (t-T) coskTrr  sinktrS 
k=l  I k-1 

l 

- 4- - 

t 

OO  I  00 

■  E  sinkn  (t-T  )  sinkrrr  coskr^!  E  coskir  (t-x )  sinkrrr  sinkir^ 
k=l  | k=l 


(A- 10 ) 


which  is  represented  by 


Gp(r,t;€,r)= 


GH^r*  t;?/T) 


[G2i (r, t;£,T) 


G12  ( r ,  t ;  5 ,  t  )' 


G 22 


(A- 11 ) 


The  solution  of  Equation  A-7  can  then  be  written  as 


w  (r ,  t) 

f1 

G11  G12 

0 

0  • 

0 

v  (r,  t)j 

G2 1  G22 

_ud(5 ,T) 

dCdx. 


(A-12 ) 


From  the  above  relation  the  Green's  function  for  the  original 
problem  can  be  determined  as  follows: 
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x(r,t)  = 


9x 

3r 


dr  = 


0 

Tf 

0'  O'  0 
t  f  1 1  r 


w  (r  ,t)dr 


(A-13) 


G12  ud(5/T)d^dTdr 


tzti.tr 

=  G, 9  u,  drd£dt 

1  nJ  n  a 


O'O'Q 

ttl  t r  00 


.ti.  t r  w 

!  C  Z  2si 
>J0  >0  k=l 


2sinkiT  (t-r)  sinkrrE;  coskirrdr ]  ud  (5  ,  t )d£dt , 


by  performing  the  above  indicate  integration,  the  Green's 
function  is  determined  as 

00 

2 

G*  (r,t;£,T)=  E  s— -  sinlc-rr  (t-*r)  sinkirC  sinkirr.  (A-14) 

F  Jc-l  kTr 

\ 

Numerical  investigation  of  the  convergence  of  this 
series  indicates  that  more  than  ten  terms  of  this  series 
are  required  to  obtain  three  significant  digits  of  accuracy. 

Pertinent  Results  from  Approximation  Theory 
and  Numerical  Analysis 

The  fundamental  problem  of  approximation  theory  may 
loosely  be  stated  as  follows:  determine  x*eX,  where  X  is 
a  subspace  of  a  linear  space  X,  such  that  its  distance  from 
x*eX  is  a  minimum,  that  is,  find  such  that  | |x*-x| |  is 
minimized.  The  following  theorem  is  the  basis  of  the  approxi¬ 
mation  theory  developed  in  Chapter  IV. 
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Theorem  A- 4:  (66)  If  X  is  a  normed  linear  space  and  X  is  a 
finite-dimensional  subspace  of  X,  then  given  x*eX,  there 
exists  x*eX  such  that 

|  |x*-x*|  I  _<  ||x*-x||  for  all  xeX. 

The  following  results  are  due  to  Kantorovich  (54) .  Let 

X  be  a  complete  subspace  of  the  Hilbert  space  X,  and  let  P 

2 

denote  a  projection  from  X  onto  X,  i.e.,  PX=X,  P  =P,  then 
clearly  P  does  not  alter  the  elements  of  X.  Consider  two 
iterations,  the  first  in  the  space  X 

xn+l=G(xn)'  (A'15) 

and  the  second  in  the  space  X 

xn+;,  «  S(xn).  (A-16) 

In  what  follows  Equation  A-15  will  be  called  the  exact 
iteration ,  and  Equation  A-16  the  approximate  iteration . 

The  spaces  x  and  x,  and  the  functions  G  and  <3  will  be 
connected  by  the  following  conditions. 

(i)  (Condition  that  G  and  6  be  neighboring  functions) 

for  every  xeX, 

| | PG (x)  -  S(x) | !  <  n  |  | x |  |  .  (A-17) 

(ii)  (Condition  for  elements  of  the  form  G(x)  to  be 

approximated  closely  by  elements  of  X.)  For  every 
xex,  there  is  an  xeX  such  that 


I  |  G  (x)  -x  |  I  <_  n1l|x|  I  . 

The  above  results  are  useful  in  answering  problems  en¬ 
countered  in  this  dissertation.  These  problems  include: 

(1)  establishing  the  practicality  and  convergence  of  the 
approximated  gradient  algorithms;  (2)  investigating  the 
speed  of  convergence;  and,  (3)  obtaining  suitable  estimates 
of  the  error. 

In  the  analysis  of  the  approximations  due  to  dis¬ 
cretization,  the  approximate  operator  5  often  depends  on  a 
parameter  h,  which  is  a  measure  of  the  discretization.  By 
extending  the  definition  due  to  Henrici  (67) ,  the  order  of 
an  approximate  operator  cam  be  defined  as  follows;  if  6  is 
of  order  £  then 

fi[x;h]  =  cp+1hp+1x(p+1) (x+th)  +  0(hp+2)  . 

where  c,  is  a  constant,  0<t<l,  and  lim  0. 

P+1  -  ~  h+0  * 


This  definition  is  standard  in  connection  with  the  approxi 
mation  of  differential  operators  by  difference  operators,  and 
integral  operators  by  summation  operators. 

The  approximation  of  partial  differential  equations  by 
finite-difference  operators  results  in  errors.  Before  dis¬ 
cussing  these  errors  some  additional  nomenclature  is 
required. 
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Definition  A-25;  Let  fixT  denote  the  domain  of  a  finite- 
difference  operator.  Then  the  net  (mesh,  grid)  which 
partitions  QxT  is  defined  by 

fl  =  {(r,t);  r=a^  and  t  =  b,^}. 

The  nodes  N  of  the  net  ft  are  the  points  of  intersection 
of  the  curves  which  define  the  net  (see  Figure  A-l). 


Figure  A-l.  The  node  N  of  the  net  fl 
Definition  A-26:  Let  x(ir,it)  denote  the  exact  solution  of 
partial  differential  equation  evaluated  at  the  nodes  of  the 
net.  Then  the  truncation  error  (on  the  nodes)  of  the  finite 
difference  operator  is  defined  as 

et(ir,it)  =  x(ir,it)  -  x(ir,it), 
where  x  is  the  approximate  solution  obtained  from  the  finite 
difference  method. 

The  second  source  of  errors  arises  from  the  fact  that 
x  cannot  be  calculated  with  exact  precision  because  of  the 
limited  accuracy  of  any  computing  equipment. 
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APPENDIX  B.  THE  NON-LINEAR  DISTRIBUTED  PARAMETER 
OPTIMAL  CONTROL  PROBLEM 
Problem  Formulation 


Consider  the  nonlinear  distributed  dynamical  system 


3x(r,t)  _  ,  ..  |x 

It  "  f (t,r,x(r,t) ,  >  •  •  •  > 

with  initial  conditions  x  (r,  tQ)  =xQ  (r) 
condition 


~ — un (r, t ) )  ,  (B-l) 

3r*  u 

,  and  with  boundary 


3x(r,t) 

dt 


an 


g (t,r,x, 


3x 

3r' *  *  *  ' 


,k-l 


Sr 


k-1' 


uB(t)) 


(B-2) 


where  refiCR™,  teTCR1. 

At  any  time  teT,  the  state  of  the  system  .is  denoted  by 
x(r,t),  the  distributed  control  vector  is  denoted  by 
uQ(r,t),  and  the  boundary  control  vector  is  represented  by 

4*  Vi 

ufi(t).  The  i  component  of  x(r,t)  is  written  as 

0  fK 

xi  (r-^  ,r2  , .  .  .  ,rm,  t)  eL  (51xT)  ,  i-l,2,.,.,n;  the  k  component  of 

uQ(r,t)  is  denoted  by  u^  (r^  ,r,, , . .  .  ,rm,  t)  eL2  (ftxT)  ,  i=l , 2 , . . . ,p<n ; 

tli  i  2 

and  the  i  component  of  u3(t)  is  represented  as  Ug(t)eL  (T), 

1=1.2 , ,k£n.  If  3  ft  denotes  the  boundary  of  ft,  then  r^e3ft. 

k  k 

The  notation  3  x(r,t)/3r  is  utilized  to  represent 
spatial  derivatives  of  x(r,t)  (refer  to  (28)). 

Only  well-posed  distributed  parameter  systems  with  unique 
solutions  are  considered. 

The  performance  functional  considered  is  given  by 
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J [uD(r,t) ,uB(t) ] 


G[x,r ,Tf ) J 


+ 


9ft 


K[x(r,T-)  Idft 
ft  r 


T  , .  9x 

Lg(t,r,x,  9r# 


9kx 

1?' 


UB> 


dt 


(3-3) 


9ft 


+ 


L[)(t/r/x, 


9x 

9r' 


where 


uD)dft  dt, 


(  )  dft  h:  [  [  . . .  f  (  )  dr,  dr,  . .  .  dr  , 

Jft  JrJr0  ‘  r  1  * 

12  W 

and  LD,  K,  and  G  are  sufficiently  smooth  real  valued  functions. 

The  optimum  control  problem  is  now  stated  as  follows: 
determine  from  the  set  U  of  admissible  controls  the  control 
vector  u, 


T  T  T 

u  =  [uQ,  u0] ,  ueU , 


(B-4) 


which  satisfies  the  state  equations  along  with  the  initial 
and  the  boundary  conditions  and  at  the  same  time  minimizes 

JtVUB]- 


Derivation  of  the  Necessary 
Conditions 


Let  J  be  the  functional  defined  by  Equation  B-3.  In 
order  for  J  to  have  a  relative  local  minimum  at  u*eU,  it  is 
necessary  that  the  first  Frechet  derivative  of  J  at  u*  be 
identically  zero,  that  is 


ii.  J‘m-Jlu*1 

u+u*  I  | u-u*  I  I 
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(B-5) 


9J 

3u 


u* 


=  0  . 


Let  p(r,t)  denote  the  Lagrange  multiplier  associated 
with  the  dynamical  system,  and  let  X (t)  denote  the  Lagrange 
multiplier  associated  with  the  boundary  conditions.  By  using 
the  Lagrange  multipliers  p(r,t)  and  A(t),  the  constrained 
problem  defined  by  Equations  B-l,  B-2,  and  B-3  can  be  re¬ 
formulated  as  an  uncons trained  problem,  where  the  unconstrained 
cost  functional  is  given  by 

3 [u] =G  +  [  {LK  +  < A , g-  l£>|  } dt  +  f  K  d!i  ( B- 6 ) 

Jt  b  'an  h 


f  {LD+<p,f-  |~>dft  dt.1 


If  the  constraints  are  satisfied  then  min  3  -  min  J  .  Thus, 

ueU  ueU 

the  minimum  of  J  can  be  obtained  from  the  minimum  of  3.  In 
the  following  development  it  will  be  assumed  that  the  con¬ 
straints  are  satisfied  and  hence,  3= J. 

Let  HD  denote  the  distributed  Hamiltonian,  and  let 
denote  the  boundary  Hamiltonian,  where 


Hz>  =  ld  +  <P»f>»  (B~7) 

and 

HB  =  IB  +  <A,g>.  (B-8) 


"^The  arguments  of  G,  Lg,  A,  g,  f,  K,  LD,  p,  and  x 
will  be  dropped  for  notational  simplicity. 
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Substitution  of  the  distributed  and  the  boundary  Hamil¬ 
tonians  into  Equation  B-6  yields 


J(u]=G+  I  K  dQ  +  [  [Hq-<X,  |~|  > ] dt 

■In  Jt  a 


{ Jn  iv<p.  H>i«> dt  • 


(B-9) 


From  Equation  B-8  the  difference  J[u]-J[u*]  is  given  by 


J[u]-J[u*]=G-G*+ 


(K-K*)dfi 


fH  -H*-<X,  f£|  -  f^-l  >]dt 

T  b  b  3t|3n  at  |3tJ 


« 


p)  y 


5x*  . 

— >  J  dJl 


dt. 


(B-10) 


Now  perturb  the  control  u*  by  letting  u=u*+ey,  and 
let  x-x*+tV  denote  the  trajectory  corresponding  to  u. 
Expanding  the  terms  in  Equation  B-9  about  u*  and  x*  in  a 
Taylor's  series  yields 


G-G*=e< 


3G 

3x 


,  v>  I 

* 

an 

+  0  (e)  , 


(B-ll) 


,*> 


+  0(e) 


(B-12) 
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3Ht 

HB-HB=e<^T 


k-1  3H 
+  e  E  <i 


B 


^3 -  /  *  i> 

3fi  i=1  3  r1  r 


an 


3hr 

+e<3^  'e> 


+  0(c) 


(B-13) 


3Hd  k  3H 

v^irH  ',,>+e.e1<s*7 

I  * 


gkm  9  Hr) 

/  — »•>+£  s——, n>+0  (e) , 
,  ar1  3ud 


(B-14) 


and 


3x  _9x^_  _  ax 

at  "at  “  eat  ' 


(B-15) 


where 


uT= [nT/ST] ,  and  lim 
e+0 


•  0. 


Substitution  of  Equations  B-ll,  B-12,  B-13,  B-14,  and 
B-15  into  B-10,  and  integrating  by  parts  yields 


u»  J.iol-JJa*.1.  .  <f§  -x,T> 

G  d  X 

e-*-0 


an 


i  -p»M  d« 


(B-16 ) 


t  k-1  .  ^i  3H„  .  3  H„ 

(<  z  (_i)i  9  [  D_]+x+  B  v> 

JT  i=0  ar1  3x  i  +  1  3x 


+  < 


k-1  k=i 
+  Z  <  Z  (-1) 
i=2  j=0 


-ni  £ 


r 

8H, 


an 


3*  k'’rk-l- 


an 


3H_ 


i‘a-Jr~j5“I+  ax~‘B~'  '  T  i-iH  }dt 
ar1  3x  i  +  j  9X  i-1  r1  1  'an 


■[: 


8Hn  k  i  a1  9Hn  a«  9Hn 

+  *  i.i*'11  +  S£T>+X'n>1*  dt' 
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Optimality  Conditions 

It  follows  from  Equation  B-15  that  in  order  for 
T  T 

tug  ,  ug  ]  to  be  optimal,  it  is  necessary  that  there  exist 
functions  p*(r,t)  and  \*(t)  such  that 

a.  x*(r,t)  is  a  solution  of  the  distributed  state  system 


§f  =  f,  x(r,tQ)  =  (r) ,  |£  =  g  (B-17) 

b.  p*(r,t)  is  a  solution  of  the  distributed  costate 
system 


k 

E  (-D 
i=l 


p(r,Tf)  = 


(B-18) 


c.  X*(t)  is  a  solution  of  the  ordinary  differential 
equation 


dX 

dt 


k-1 

E 

i=0 


(~l)i 


(B-19) 


X(Tf) 


3G 

Dx 

t=Tf 
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d.  The  transversal! ty  conditions  are  satisfied,  i.e. , 


3H. 


3H, 


3x 


■0, 


T x 


3fl 


i-1 


3n 


k-i  ,  aj 
+  I  (-1)3  - 
j=0  3r 


3H_ 


73* 


i+j 


(B-20) 


1=2  # 3, * • • f k* 1  * 


e.  The  gradient  of  J  vanishes#  i.e.# 

g(u)T=tgD(u)T,gB(u)T]-[{3HD/3uD)T  ,  (B-21) 

(3HB/3uB)T]-[0T,0,r]. 


